
This dataset contains individual-level health, behavioral, and demographic variables used to develop and evaluate machine learning models for predicting stroke risk among adults with and without coronary heart disease (CHD). The data consist of 22 variables, including clinical indicators (high blood pressure, high cholesterol, diabetes, BMI), lifestyle behaviors (smoking, physical activity, alcohol consumption, fruit and vegetable intake), access to healthcare, self-reported health status, functional limitations, mental and physical health days, as well as demographic factors (sex, age, education, income). The dataset includes both individuals who have experienced a stroke (Stroke = 1) and those without stroke (Stroke = 0), enabling development of supervised classification models. All variables are encoded numerically to support statistical modelling and machine learning. No personally identifiable information is included. This dataset was prepared as part of a study on explainable machine learning for stroke risk prediction and can be used for benchmarking, algorithm comparison, reproducibility studies, and model interpretability research.
Dataset is derived from publicly available, de-identified survey data. All preprocessing steps were performed by the author, including cleaning, encoding, and variable selection.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
