QVZ: lossy compression of quality values

descriptionPublicationkeyboard_double_arrow_right Article 28 May 2015 Spain English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 34, pages 179-179 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )Funded by:NIH | Genomic Compression: From...

Authors: Greg Malysa; Mikel Hernaez; Idoia Ochoa; Milind Rao; Karthik Ganesan 0001; Tsachy Weissman;

doi: 10.1093/bioinformatics/btx654 , 10.1093/bioinformatics/btv330

pmid: 26026138 , 29177464

pmc: PMC5856090 , PMC5998985

handle: 10171/113610

QVZ: lossy compression of quality values

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation Recent advancements in sequencing technology have led to a drastic reduction in the cost of sequencing a genome. This has generated an unprecedented amount of genomic data that must be stored, processed and transmitted. To facilitate this effort, we propose a new lossy compressor for the quality values presented in genomic data files (e.g. FASTQ and SAM files), which comprise roughly half of the storage space (in the uncompressed domain). Lossy compression allows for compression of data beyond its lossless limit. Results The proposed algorithm QVZ exhibits better rate-distortion performance than the previously proposed algorithms, for several distortion metrics and for the lossless case. Moreover, it allows the user to define any quasi-convex distortion function to be minimized, a feature not supported by the previous algorithms. Finally, we show that QVZ-compressed data exhibit better performance in the genotyping than data compressed with previously proposed algorithms, in the sense that for a similar rate, a genotyping closer to that achieved with the original quality values is obtained. Availability and implementation QVZ is written in C and can be downloaded from https://github.com/mikelhernaez/qvz. Contact mhernaez@stanford.edu or gmalysa@stanford.edu or iochoa@stanford.edu Supplementary information Supplementary data are available at Bioinformatics online.

Country

Spain

Related Organizations

Stanford University
United States
University of Navarra
Spain

Keywords

Genotype, Genotyping Techniques, Quality values, QVZ, Data Compression, Polymorphism, Single Nucleotide, Databases, Genetic, Lossy compression, Animals, Humans, Algorithms

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	52
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%