
Background Disability assessment using the Expanded Disability Status Scale (EDSS) is important to inform treatment decisions and monitor the progression of multiple sclerosis. Yet, EDSS scores are documented infrequently in electronic medical records. Objective To validate a machine learning model to estimate EDSS scores for multiple sclerosis patients using clinical notes from neurologists. Methods A machine learning model was developed to estimate EDSS scores on specific encounter dates using clinical notes from neurologist visits. The OM1 MS Registry data were used to create a training cohort of 2632 encounters and a separate validation cohort of 857 encounters, all with clinician-recorded EDSS scores. Model performance was assessed using the area under the receiver-operating-characteristic curve (AUC), positive predictive value (PPV), and negative predictive value (NPV), calculated using a binarized version of the outcome. The Spearman R and Pearson R values were calculated. The model was then applied to encounters without clinician-recorded EDSS scores in the MS Registry. Results The model had a PPV of 0.85, NPV of 0.85, and AUC of 0.91. The model had a Spearman R value of 0.75 and Pearson R value of 0.74 when evaluating performance using the continuous estimated EDSS and clinician-recorded EDSS scores. Application of the model to eligible encounters resulted in the generation of eEDSS scores for an additional 190,282 encounters from 13,249 patients. Conclusion EDSS scores can be estimated with very good performance using a machine learning model applied to clinical notes, thus increasing the utility of real-world data sources for research purposes.
Original Research Article
Original Research Article
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 9 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
