Transforming healthcare through cloud-native machine learning architecture: A case study in AWS, Spark, and Kubernetes Implementation

descriptionPublicationkeyboard_double_arrow_right Article 30 May 2025Publisher:GSC Online PressJournal:World Journal of Advanced Research and Reviews, volume 26, pages 1,622-1,631 (eissn: 2581-9615,

Copyright policy )

Authors: Pasupuleti, Naveen Srikanth;

doi: 10.30574/wjarr.2025.26.2.1649 , 10.5281/zenodo.17310374 , 10.5281/zenodo.17310375

Transforming healthcare through cloud-native machine learning architecture: A case study in AWS, Spark, and Kubernetes Implementation

- Summary
- Subjects
- Metrics

Abstract

This article examines a transformative case study in healthcare data infrastructure, where a skilled data engineer revolutionized operations by implementing an integrated technology stack with advanced machine learning capabilities. Facing challenges of processing diverse and voluminous patient data, the engineer architected a comprehensive solution leveraging AWS services, including S3, Redshift, and Lambda to create a cloud-based data lake optimized for AI workloads. This foundation was augmented with Apache Spark for distributed processing and MLlib for scalable machine learning, Hadoop clusters for specialized workloads, and Kubernetes for container orchestration—creating a flexible, resilient system capable of supporting sophisticated predictive models. The implementation featured automated ETL processes within a robust data pipeline alongside purpose-built feature stores and model serving infrastructure. A strategic combination of SQL and NoSQL databases provided flexible storage solutions optimized for various machine learning algorithms, from natural language processing for clinical notes to computer vision for medical imaging. Despite obstacles including data inconsistency and latency issues, the solution delivered substantial improvements in operational efficiency and clinical outcomes through AI-powered predictive capabilities, demonstrating the transformative potential of modern data engineering and machine learning approaches in healthcare settings.

Keywords

FOS: Computer and information sciences, Container Orchestration, Healthcare Analytics, ETL Automation, Data Lake Architecture, Distributed Computing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

gold