
handle: 10362/160911
Customer churn can be defined as the phenomenon of customers who discontinue their relationship with a company. This problem is transversal to many industries, including the software industry. This study uses Machine Learning to build a predictive model to identify potential churners in a Portuguese software house. Six popular Machine Learning models: Random Forest, AdaBoost, Gradient Boosting Machine, Multilayer Perceptron Classifier, XGBoost, and Logistic Regression, were developed to assess which one would have a better performance. The experimental results show that boosting techniques such as XGBoost present the best predictive performance. The XGBoost model presents a Recall of 0.85 and a ROC AUC of 0.86. Additionally to the model performance, the study of the model's feature importance revealed that some factors, such as the time to solve a support ticket, the type of application, the license age, and the number of incidents, significantly influence customer churn. These insights can help the software industry key drivers of churn and prioritize retention efforts accordingly.
Dias, J. P. R., & António, N. (2025). Predicting customer churn using Machine Learning: A case study in the software industry. Journal of Marketing Analytics, 13, 111–127. https://doi.org/10.1057/s41270-023-00269-9 --- This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia) under the project - UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS
Machine Learning, SaaS, Marketing, Customer Churn Prediction, Strategy and Management, Economics, Econometrics and Finance (miscellaneous), Data Mining, SDG 8 - Decent Work and Economic Growth, Supervised Learning, Statistics, Probability and Uncertainty
Machine Learning, SaaS, Marketing, Customer Churn Prediction, Strategy and Management, Economics, Econometrics and Finance (miscellaneous), Data Mining, SDG 8 - Decent Work and Economic Growth, Supervised Learning, Statistics, Probability and Uncertainty
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
