Downloads provided by UsageCounts
Traditional computational phenotypes (CPs) identify patient cohorts without consideration of underlying pathophysiological mechanisms. Deeper patient-level characterizations are necessary for personalized medicine and while advanced methods exist, their application in clinical settings remains largely unrealized. This thesis advances deep CPs through several experiments designed to address four requirements. Stability was examined through three experiments. First, a multiphase study was performed and identified resources and remediation plans as barriers preventing data quality (DQ) assessment. Then, through two experiments, the Harmonized DQ Framework was used to characterize DQ checks from six clinical organizations and 12 biomedical ontologies finding Atemporal Plausibility and Completeness and Value Conformance as the most common clinical checks and Value and Relation Conformance as the most common biomedical ontology checks. Scalability was examined through three experiments. First, a novel composite patient similarity algorithm was developed that demonstrated that information from clinical terminology hierarchies improved patient representations when applied to small populations. Then, ablation studies were performed and showed that the combination of data type, sampling window, and clinical domain used to characterize rare disease patients differed by disease. Finally, an algorithm that losslessly transforms complex knowledge graphs (KGs) into representations more suitable for inductive inference was developed and validated through the generation of expert-verified plausible novel drug candidates. Interoperability was examined through two experiments. First, 36 strategies to align five eMERGE CPs to standard clinical terminologies were examined and revealed lower false negative and positive counts in adults than in pediatric patient populations. Then, hospital-scale mappings between clinical terminologies and biomedical ontologies were developed and found to be accurate, generalizable, and logically consistent. Multimodality was examined through two experiments. A novel ecosystem for constructing ontologically-grounded KGs under alternative knowledge models using different relation strategies and abstraction strategies was created. The resulting KGs were validated through successfully enriching portions of the preeclampsia molecular signature with no previously known literature associations. These experiments were used to develop a joint learning framework for inferring molecular characterizations of patients from clinical data. The utility of this framework was demonstrated through the accurate inference of EHR-derived rare disease patient genotypes/phenotypes from publicly available molecular data.
This thesis is licensed as Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). A copy of the license has been attached to this record (Creative Commons — Attribution-NonCommercial-ShareAlike 4.0 International — CC BY-NC-SA 4.0.pdf)
Common Data Elements, OMOP Common Data Model, Scalability, Representation Learning, Open Biomedical Ontologies, Computational Phenotyping, Graph Representation Learning, Interoperability, Phenotype, Knowledge, Biological Ontologies, Artificial Intelligence, Knowledge graphs, Ontologies, Patient Representation Learning, Learning, Knowledge Graphs, Data Quality, Stability, Semantic Web, Multimodality
Common Data Elements, OMOP Common Data Model, Scalability, Representation Learning, Open Biomedical Ontologies, Computational Phenotyping, Graph Representation Learning, Interoperability, Phenotype, Knowledge, Biological Ontologies, Artificial Intelligence, Knowledge graphs, Ontologies, Patient Representation Learning, Learning, Knowledge Graphs, Data Quality, Stability, Semantic Web, Multimodality
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 96 | |
| downloads | 91 |

Views provided by UsageCounts
Downloads provided by UsageCounts