Data Virtualization for Machine Learning

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Article , Preprint 01 Jan 2025Embargo end date: 01 Jan 2025 English Publisher:Springer Nature Switzerland

Authors: Saiful Khan; Joyraj Chakraborty; Philip Beaucamp; Niraj Bhujel; Min Chen;

doi: 10.1007/978-3-032-06320-5_6 , 10.48550/arxiv.2507.17293

arXiv: 2507.17293

Data Virtualization for Machine Learning

- Summary
- Subjects
- Metrics

Abstract

Nowadays, machine learning (ML) teams have multiple concurrent ML workflows for different applications. Each workflow typically involves many experiments, iterations, and collaborative activities and commonly takes months and sometimes years from initial data wrangling to model deployment. Organizationally, there is a large amount of intermediate data to be stored, processed, and maintained. \emph{Data virtualization} becomes a critical technology in an infrastructure to serve ML workflows. In this paper, we present the design and implementation of a data virtualization service, focusing on its service architecture and service operations. The infrastructure currently supports six ML applications, each with more than one ML workflow. The data virtualization service allows the number of applications and workflows to grow in the coming years.

Related Organizations

Science and Technology Facilities Council
United Kingdom
University of Oxford
United Kingdom
Rutherford Appleton Laboratory
United Kingdom

Keywords

Machine Learning, Software Engineering (cs.SE), FOS: Computer and information sciences, Software Engineering, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

UArctic