Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Journal of Informati...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

Kafka-Driven Scalable Streaming Pipelines for Real-Time Sensor Ingestion and High-Throughput Data Lakehouse Architecture

Authors: Yogesh Pugazhendhi Duraisamy Rajamani;

Kafka-Driven Scalable Streaming Pipelines for Real-Time Sensor Ingestion and High-Throughput Data Lakehouse Architecture

Abstract

The current business world, which implements sensor-based applications in the industrial automation, manufacturing, and smart infrastructure sectors, encounters critical issues on how to process the continuous high velocity data streams that require real-time information to make operation-related intelligence and automated decision making. Conventional batch-based models are ineffective in addressing the extremely strict latency and scalability needs of immense data rates of streaming sensor telemetry. The presented architectural framework tackles all these challenges by integrating the distributed commit log platform of Apache Kafka into a single system comprising modern data lakehouse storage solutions and distributed stream processing engines. The suggested architecture allows the organization to create scalable streaming pipelines between edge sensor ingestion, through real-time transformation, to enduring analytical storage and ensures data quality, governance, compliance, and system reliability under the load configuration. Kafka cluster infrastructure with partitioned topics was replicated by fault tolerance mechanisms, stream processing engines like Apache Flink and Twitter Heron with stateful transformations and windowed aggregations with exactly-once semantics, and lakehouse platforms with ACID transactions and schema evolution with integrated batch-stream analytics on cloud object storage are considered core architectural components. The framework also uses advanced design patterns of partition strategies, consumer group coordination, backpressure management, watermark-based event time processing, and tiered storage optimization. The application patterns in production deployment have proved that the architecture can use a variety of sensor loads with reduced operational-analytical boundaries by removing multi-layered deployable designs. The centralized platform allows event streams to be independently consumed by multiple downstream applications, it does schema governance across evolving sensor ecosystems, and it is the basis of advanced services such as online machine learning inference, adaptive resource management, and cross-datacenter replication of sensor networks around the world.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
gold