descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jun 2015Embargo end date: 01 Jan 2014Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Journal of Selected Topics in Signal Processing, volume 9, pages 741-748 (issn: 1932-4553, eissn: 1941-0484,

Authors: Krishnan, Nikhil; Baron, Dror;

doi: 10.1109/jstsp.2015.2403800 , 10.48550/arxiv.1407.1514

arXiv: 1407.1514

A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm

- Summary
- Subjects
- Metrics

Abstract

Computing problems that handle large amounts of data necessitate the use of lossless data compression for efficient storage and transmission. We present a novel lossless universal data compression algorithm that uses parallel computational units to increase the throughput. The length-$N$ input sequence is partitioned into $B$ blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of $B$, but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) context tree source underlying the entire input, and then encode each of the $B$ blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is $O(N/B)$. Its redundancy is approximately $B\log(N/B)$ bits above Rissanen's lower bound on universal compression performance, with respect to any context tree source whose maximal depth is at most $\log(N/B)$. We improve the compression by using different quantizers for states of the context tree based on the number of symbols corresponding to those states. Numerical results from a prototype implementation suggest that our algorithm offers a better trade-off between compression and throughput than competing universal data compression algorithms.

Accepted to Journal of Selected Topics in Signal Processing special issue on Signal Processing for Big Data (expected publication date June 2015). 10 pages double column, 6 figures, and 2 tables. arXiv admin note: substantial text overlap with arXiv:1405.6322. Version: Mar 2015: Corrected a typo

Related Organizations

South Carolina State University
United States
North Carolina State University
United States
North Carolina Agricultural and Technical State University
United States
NORTH CAROLINA STATE UNIVERSITY
United States

Keywords

FOS: Computer and information sciences, Computer Science - Information Theory, Information Theory (cs.IT)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

hybrid

Fields of Science (4) View all

Fields of Science

Funded by

NSF| CIF: Small: Universal Signal Estimation from Noisy Measurements