
Communicable lung diseases continue to cause high morbidity and mortality in low-resource healthcare settings due to delayed diagnosis, reliance on manual record-keeping and limited availability of labeled clinical data required for supervised machine learning models. As a result, existing disease detection systems are difficult to deploy effectively, leading to late identification of infectious cases and increased disease transmission. To address this challenge, an unsupervised clustering framework based on the K-Modes algorithm is proposed for the early identification of communicable lung diseases. The approach analyzes categorical hospital patient records, including symptoms, diagnostic indicators, medical history, and demographic features, to automatically group patients into communicable and non-communicable disease categories without prior labeling. K-Modes is selected for its suitability in handling categorical medical data using dissimilarity measures and iterative mode updates. The proposed framework identifies clinically coherent clusters consistent with known disease patterns. Cluster validation using purity and clinical interpretation demonstrates effective separation between infectious and non-infectious cases. The model achieves an accuracy of 95.6%, precision of 96.1%, recall of 95.2%, and an F1-score of 92.2%, indicating strong clustering performance. These results demonstrate that unsupervised learning can support early disease identification, rapid pre-screening, and improved clinical triage in resource-constrained healthcare environments
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
