PyTond: Efficient Python Data Science on the Shoulders of Databases

Name: PyTond: Efficient Python Data Science on the Shoulders of Databases
Keywords: FOS: Computer and information sciences, Computer Science - Programming Languages, Computer Science - Databases, Databases (cs.DB), Programming Languages (cs.PL)

Hesam Shahrokhi; Amirali Kaboli; Mahdi Ghorbani; Amir Shaikhha

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1109/icde60...

Article . 2024 . Peer-reviewed

License: STM Policy #29

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2024

License: CC BY

Data sources: Datacite

DBLP

Article

Data sources: DBLP

DBLP

Conference object

Data sources: DBLP

PyTond: Efficient Python Data Science on the Shoulders of Databases

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 13 May 2024Embargo end date: 01 Jan 2024Publisher:IEEEJournal:2024 IEEE 40th International Conference on Data Engineering (ICDE)

Authors: Hesam Shahrokhi; Amirali Kaboli; Mahdi Ghorbani; Amir Shaikhha;

doi: 10.1109/icde60146.2024.00039 , 10.48550/arxiv.2407.11616

arXiv: 2407.11616

PyTond: Efficient Python Data Science on the Shoulders of Databases

- Summary
- Subjects
- Metrics

Abstract

Python data science libraries such as Pandas and NumPy have recently gained immense popularity. Although these libraries are feature-rich and easy to use, their scalability limitations require more robust computational resources. In this paper, we present PyTond, an efficient approach to push the processing of data science workloads down into the database engines that are already known for their big data handling capabilities. Compared to the previous work, by introducing TondIR, our approach can capture a more comprehensive set of workloads and data layouts. Moreover, by doing IR-level optimizations, we generate better SQL code that improves the query processing by the underlying database engine. Our evaluation results show promising performance improvement compared to Python and other alternatives for diverse data science workloads.

Extended version of ICDE 2024

Related Organizations

University of Edinburgh
United Kingdom
Universtity of Edinburgh
United Kingdom
University of Edinburgh, School of Informatics
United Kingdom

Keywords

FOS: Computer and information sciences, Computer Science - Programming Languages, Computer Science - Databases, Databases (cs.DB), Programming Languages (cs.PL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

UArctic