Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Other literature type . 2026
License: CC BY
Data sources: Datacite
ZENODO
Other literature type . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Data-Centric Reliability in Big Data Systems: An End-to-End Framework for Data Quality and Observability

Authors: Thuy, Nguyen;

Data-Centric Reliability in Big Data Systems: An End-to-End Framework for Data Quality and Observability

Abstract

Data-centric reliability has emerged as a critical concern in modern big data systems, where the quality and trustworthiness of data directly impact analytical outcomes, machine learning model performance, and business decision-making. This comprehensive framework addresses the dual challenge of establishing theoretical foundations for data quality assessment while providing practical implementation strategies for end-to-end observability pipelines in production environments.We synthesize insights from extensive peer-reviewed literature spanning theoretical frameworks, practical implementations, and real-world case studies to present a unified approach that bridges academic research and industry practice.The framework introduces a multi-layered architecture integrating four core data quality dimensions—accuracy, completeness, consistency, and timeliness—with observability mechanisms across ingestion, processing, storage, and consumption layers. We establish formal definitions and measurement methodologies for each quality dimension while providing framework-agnostic principles that enable portability across diverse technology stacks. The practical implementation strategies encompass technology selection criteria, design patterns (Lambda, Kappa, microservices, data mesh), and deployment approaches for production environments.Through four detailed case studies spanning mobile network analytics, cloud-based distributed databases, industrial IoT platforms, and smart building applications, we demonstrate measurable improvements in system reliability (up to 99.7%), data quality scores (96%), and operational efficiency (65% team productivity gains). Comprehensive benchmarking establishes performance baselines and evaluation metrics including throughput, latency, quality assessment scores, and business impact measures. The framework achieves a 340% ROI across implementations with significant reductions in data incidents and operational costs.This work contributes to both theoretical understanding and practical application of data-centric reliability, offering researchers a rigorous foundation for further investigation while providing practitioners with actionable guidance for implementing robust quality and observability solutions inproduction big data systems.

Keywords

data quality, observability, big data systems, reliability framework, data pipeline mon- itoring, accuracy, completeness, consistency, timeliness, end-to-end architecture, data governance, machine learning operations, distributed systems

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!