
Predicting dropout rates accurately on digital education platforms such as ProFuturo will enable preventive measures to be taken, significantly reducing dropout rates and optimizing the use of resources. Several authors have addressed this problem using machine learning and artificial intelligence techniques with encouraging results. However, most of the approaches are based only on demographic variables or the course completion certificate discarding relevant information available in Moodle platform.Furthermore, they obtain moderate success rates and are not easily interpretable in terms of the indicators considered.In this paper we propose a novel methodology for accurate dropout prediction that takes into account all informative variables fromMoodle. The approach is based on simple machine learning models and maintains high interpretability in terms of input variables.The experimental results show that a methodology based on Random Forest can achieve high detection probability (91%) without compromising specificity with a value of 88%. Moreover, the application of SHAP algorithm has provided high interpretability to understand the role of different variables.
Random Forest, Machine learning, Logistic Regression
Random Forest, Machine learning, Logistic Regression
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
