Improving topic modeling performance on social media through semantic relationships within biomedical terminology

Name: Improving topic modeling performance on social media through semantic relationships within biomedical terminology
Keywords: Science, Terminology as Topic, Q, R, Medicine, Humans, Electronic Health Records, Systematized Nomenclature of Medicine, Social Media, Research Article

Yi Xin; Monika E. Grabowska; Srushti Gangireddy; Matthew S. Krantz; V. Eric Kerchberger; Alyson L. Dickson; Qiping Feng; Zhijun Yin; Wei-Qi Wei

Found an issue? Give us feedback

PLoS ONEarrow_drop_down

PLoS ONE

Article . 2025 . Peer-reviewed

License: CC BY

Data sources: Crossref

PLoS ONE

Article . 2025

Data sources: Europe PubMed Central

PubMed Central

Other literature type . 2025

License: http://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data sources: PubMed Central

PLoS ONE

Article . 2025

Data sources: DOAJ

Improving topic modeling performance on social media through semantic relationships within biomedical terminology

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 21 Feb 2025 English Publisher:Public Library of Science (PLoS)Journal:PLOS ONE, volume 20, page e0318702 (eissn: 1932-6203,

Copyright policy )Funded by:NIH | Drug repositioning for Al..., NIH | PheMAP: Measured, Automat..., NIH | Learning Precision Medici... +2 projects

Authors: Yi Xin; Monika E. Grabowska; Srushti Gangireddy; Matthew S. Krantz; V. Eric Kerchberger; Alyson L. Dickson; Qiping Feng; +2 Authors

doi: 10.1371/journal.pone.0318702

pmid: 39982945

pmc: PMC11845042

Improving topic modeling performance on social media through semantic relationships within biomedical terminology

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Topic modeling utilizes unsupervised machine learning to detect underlying themes within texts and has been deployed routinely to analyze social media for insights into healthcare issues. However, the inherent messiness of social media hinders the full realization of this technique’s potential. As such, we hypothesized that restricting medical concepts in social media texts to specific related semantic types and applying topic modeling to these concepts could be a feasible approach to overcome the challenge of traditional topic modeling for social media texts. Therefore, we developed a semantic-type-based topic modeling pipeline to discover self-reported health-related topics. This pipeline integrated semantic type information and Systematized Medical Nomenclature for Medicine (SNOMED) precoordinated expressions into a traditional topic modeling approach to enhance effectiveness in clustering meaningful, distinct topics. Using social media texts regarding statins for illustration, we evaluated the efficacy of this new approach and validated a newly identified topic using real-world clinical data. Based on expert evaluations, this approach resulted in more novel, distinguishable, and meaningful health-related topics compared to traditional topic modeling. In addition, our electronic health record validation for a newly identified topic in two real-world clinical databases indicated that statin users had a higher prevalence of depression or anxiety compared to matched non-users. Our results indicate that this new topic modeling pipeline can improve the extraction of themes from noisy online discussions, thereby contributing to deeper insights for healthcare research.

Related Organizations

University of Florida
United States
Vanderbilt University Medical Center
United States
Vanderbilt University Medical Center, Department of Biomedical Informatics
United States
Vanderbilt University
United States
VANDERBILT UNIVERSITY MEDICAL CENTER

View all View all

Keywords

Science, Terminology as Topic, Q, R, Medicine, Humans, Electronic Health Records, Systematized Nomenclature of Medicine, Social Media, Research Article, Semantics

1 Research products, page 1 of 1

topic_model_semantic_type software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Top 10%

Average

Green

gold

Funded byView all

NIH| Drug repositioning for Alzheimer's disease via genetics, electronic health records, and human iPSC models, NIH| PheMAP: Measured, Automated Profile to Facilitate High Throughput Phenotyping, NIH| Learning Precision Medicine for Rare Diseases Empowered by Knowledge-driven Data Mining, NIH| Predicting Phenotype by Using Transcriptomic Alteration as Endophenotype

Improving topic modeling performance on social media through semantic relationships within biomedical terminology

Improving topic modeling performance on social media through semantic relationships within biomedical terminology

1 Research products, page 1 of 1

topic_model_semantic_type software on GitHub