What Operations can be Performed Directly on Compressed Arrays, and with What Error?

Name: What Operations can be Performed Directly on Compressed Arrays, and with What Error?
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Distributed, Parallel, and Cluster Computing (cs.DC), Machine Learning (cs.LG)

Tripti Agarwal; Harvey Dam; Ponnuswamy Sadayappan; Ganesh Gopalakrishnan; Dorra Ben Khalifa; Matthieu Martel

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1145/362406...

Article . 2023 . Peer-reviewed

License: ACM Copyright Policies

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2024

License: CC BY NC ND

Data sources: Datacite

DBLP

Article

Data sources: DBLP

DBLP

Conference object

Data sources: DBLP

What Operations can be Performed Directly on Compressed Arrays, and with What Error?

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 12 Nov 2023Embargo end date: 01 Jan 2024Publisher:ACMJournal:Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisFunded by:NSF | FMiTF: Track-2 : Rigorous..., NSF | Collaborative Research: F..., NSF | Collaborative Research: S...

Authors: Tripti Agarwal; Harvey Dam; Ponnuswamy Sadayappan; Ganesh Gopalakrishnan; Dorra Ben Khalifa; Matthieu Martel;

doi: 10.1145/3624062.3625122 , 10.48550/arxiv.2406.11209

arXiv: 2406.11209

What Operations can be Performed Directly on Compressed Arrays, and with What Error?

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

In response to the rapidly escalating costs of computing with large matrices and tensors caused by data movement, several lossy compression methods have been developed to significantly reduce data volumes. Unfortunately, all these methods require the data to be decompressed before further computations are done. In this work, we develop a lossy compressor that allows a dozen fairly fundamental operations directly on compressed data while offering good compression ratios and modest errors. We implement a new compressor PyBlaz based on the familiar GPU-powered PyTorch framework, and evaluate it on three non-trivial applications, choosing different number systems for internal representation. Our results demonstrate that the compressed-domain operations achieve good scalability with problem sizes while incurring errors well within acceptable limits. To our best knowledge, this is the first such lossy compressor that supports compressed-domain operations while achieving acceptable performance as well as error.

An extended but earlier version of paper in https://dl.acm.org/doi/10.1145/3624062.3625122 published at the DRBSD Workshop in 2023

Related Organizations

University of Perpignan
France
University of Utah
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Distributed, Parallel, and Cluster Computing (cs.DC), Machine Learning (cs.LG)

1 Research products, page 1 of 1

apex software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

5

Top 10%

Average

Top 10%

Green

Funded by

NSF| FMiTF: Track-2 : Rigorous and Scalable Formal Floating-Point Error Analysis from LLVM, NSF| Collaborative Research: FMitF: Track-1: Correctness at Both Ends: Rigorous ML Meets Efficient Sparse Implementations, NSF| Collaborative Research: SHF: Medium: Practical and Rigorous Correctness Checking and Correctness Preservation for Irregular Parallel Programs

What Operations can be Performed Directly on Compressed Arrays, and with What Error?

What Operations can be Performed Directly on Compressed Arrays, and with What Error?

1 Research products, page 1 of 1

apex software on GitHub