Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2026
License: CC BY NC
Data sources: ZENODO
ZENODO
Article . 2026
License: CC BY NC
Data sources: Datacite
ZENODO
Article . 2026
License: CC BY NC
Data sources: Datacite
versions View all 2 versions
addClaim

End-to-End Machine Learning Data Pipeline for Telecom Customer Churn Prediction

Authors: Dr. Sunil Bhutada; V Tanmay; T Nikhil; B Venkat Siddhant; Dr. K Srinivasa Reddy;

End-to-End Machine Learning Data Pipeline for Telecom Customer Churn Prediction

Abstract

Predictive analytics has become a cornerstone of modern telecommunications, particularly in its ability to proactively manage customer churn. By identifying high-risk subscribers in real-time, providers can shift from reactive troubleshooting to strategic retention, significantly reducing revenue loss while simultaneously enhancing long-term customer lifetime value. This project introduces a specialized Customer Intelligence and Risk Optimization Platform—an AI-driven solution designed to be accessible yet technically robust. At its core, the system utilizes a high-performance Extreme Gradient Boosting (XGBoost) algorithm to uncover complex, nonlinear correlations between diverse data points such as customer tenure, billing patterns, and service subscriptions. The platform is built on a modular, micro-service architecture designed for seamless deployment and scalability. The trained XGBoost model operates as an inference service through a FastAPI RESTful framework, allowing it to process live, structured JSON requests with high efficiency. To ensure the system remains portable and ready for any infrastructure, the entire environment is containerized using Docker. On the front end, users interact with a sophisticated, SaaS-style interface built with Streamlit. This interactive dashboard provides a vivid, real-time look at consumer risk through color-coded classifications (Low, Medium, and High) and animated probability bars, making complex data immediately understandable for stakeholders. To further bridge the gap between raw data and business action, the platform integrates a Large Language Model (LLM) to enhance interpretability and decision-making. Rather than providing just a numerical score, the system features a conversational AI assistant that generates contextual, "pro-retention" strategies based on specific model results. These intelligent suggestions help stakeholders translate predictive insights into personalized customer outreach. By combining high-performance gradient boosting with conversational AI and real-time visualization, this architecture offers a comprehensive bridge between machine learning and practical customer service operations. The paper presents a complete, nine-stage system designed to turn complex predictions into practical action through an integrated data pipeline, interactive dashboard, and AI-powered assistant.

Related Organizations
Keywords

DataOps, PostgreSQL, Customer Churn, ML Pipeline, Random Forest, Feature Store, FastAPI, MLOps, Telecom Analytics, Streamlit

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average