Kafka-Driven Scalable Streaming Pipelines for Real-Time Sensor Ingestion and High-Throughput Data Lakehouse Architecture

descriptionPublicationkeyboard_double_arrow_right Article 30 Jan 2026Publisher:Science Research SocietyJournal:Journal of Information Systems Engineering and Management, volume 11, pages 1,056-1,064 (eissn: 2468-4376,

Copyright policy )

Authors: Yogesh Pugazhendhi Duraisamy Rajamani;

doi: 10.52783/jisem.v11i1s.14241 , 10.5281/zenodo.18483609 , 10.5281/zenodo.18483608

Kafka-Driven Scalable Streaming Pipelines for Real-Time Sensor Ingestion and High-Throughput Data Lakehouse Architecture

- Summary
- Metrics

Abstract

The current business world, which implements sensor-based applications in the industrial automation, manufacturing, and smart infrastructure sectors, encounters critical issues on how to process the continuous high velocity data streams that require real-time information to make operation-related intelligence and automated decision making. Conventional batch-based models are ineffective in addressing the extremely strict latency and scalability needs of immense data rates of streaming sensor telemetry. The presented architectural framework tackles all these challenges by integrating the distributed commit log platform of Apache Kafka into a single system comprising modern data lakehouse storage solutions and distributed stream processing engines. The suggested architecture allows the organization to create scalable streaming pipelines between edge sensor ingestion, through real-time transformation, to enduring analytical storage and ensures data quality, governance, compliance, and system reliability under the load configuration. Kafka cluster infrastructure with partitioned topics was replicated by fault tolerance mechanisms, stream processing engines like Apache Flink and Twitter Heron with stateful transformations and windowed aggregations with exactly-once semantics, and lakehouse platforms with ACID transactions and schema evolution with integrated batch-stream analytics on cloud object storage are considered core architectural components. The framework also uses advanced design patterns of partition strategies, consumer group coordination, backpressure management, watermark-based event time processing, and tiered storage optimization. The application patterns in production deployment have proved that the architecture can use a variety of sensor loads with reduced operational-analytical boundaries by removing multi-layered deployable designs. The centralized platform allows event streams to be independently consumed by multiple downstream applications, it does schema governance across evolving sensor ecosystems, and it is the basis of advanced services such as online machine learning inference, adaptive resource management, and cross-datacenter replication of sensor networks around the world.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold