publication . Preprint . 2019

Computing Extremely Accurate Quantiles Using t-Digests

Dunning, Ted; Ertl, Otmar;
Open Access English
  • Published: 11 Feb 2019
Abstract
We present on-line algorithms for computing approximations of rank-based statistics that give high accuracy, particularly near the tails of a distribution, with very small sketches. Notably, the method allows a quantile $q$ to be computed with an accuracy relative to $\max(q, 1-q)$ rather than absolute accuracy as with most other methods. This new algorithm is robust with respect to skewed distributions or ordered datasets and allows separately computed summaries to be combined with no loss in accuracy. An open-source Java implementation of this algorithm is available from the author. Independent implementations in Go and Python are also available.
Subjects
free text keywords: Statistics - Computation, Computer Science - Data Structures and Algorithms
Download from

Chen, Fei, Lambert, Diane, & Pinheiro, Jose C. 2000. Incremental Quantile Estimation for Massive Tracking. Pages 516{522 of: In Proceedings of KDD.

DFU. 2019. The Apache Datafu Project. https://datafu.apache.org/. [Online; accessed 23-January-2018].

Dunning, Ted. 2018. The t-digest Library. https://github.com/tdunning/t-digest/. [Online; accessed 23-January-2018].

Greenwald, Michael, & Khanna, Sanjeev. 2001. Space-e cient Online Computation of Quantile Summaries. Pages 58{66 of: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. SIGMOD '01. New York, NY, USA: ACM.

Lib, Stream. 2019. Stream summarizer and cardinality estimator. https://github.com/ addthis/stream-lib. [Online; accessed 11-February-2019].

Munro, J.I., & Paterson, M.S. 1980. Selection and sorting with limited storage. Theoretical Computer Science, 12(3), 315 { 323. [OpenAIRE]

Pike, Rob, Dorward, Sean, Griesemer, Robert, & Quinlan, Sean. 2005. Interpreting the Data: Parallel Analysis with Sawzall. Scienti c Programming Journal, 13, 277{298. [OpenAIRE]

Shrivastava, Nisheeth, Buragohain, Chiranjeeb, Agrawal, Divyakant, & Suri, Subhash. 2004 (09). Medians and Beyond: New Aggregation Techniques for Sensor Networks. In: SenSys'04 - Proceedings of the Second International Conference on Embedded Networked Sensor Systems. [OpenAIRE]

Abstract
We present on-line algorithms for computing approximations of rank-based statistics that give high accuracy, particularly near the tails of a distribution, with very small sketches. Notably, the method allows a quantile $q$ to be computed with an accuracy relative to $\max(q, 1-q)$ rather than absolute accuracy as with most other methods. This new algorithm is robust with respect to skewed distributions or ordered datasets and allows separately computed summaries to be combined with no loss in accuracy. An open-source Java implementation of this algorithm is available from the author. Independent implementations in Go and Python are also available.
Subjects
free text keywords: Statistics - Computation, Computer Science - Data Structures and Algorithms
Download from

Chen, Fei, Lambert, Diane, & Pinheiro, Jose C. 2000. Incremental Quantile Estimation for Massive Tracking. Pages 516{522 of: In Proceedings of KDD.

DFU. 2019. The Apache Datafu Project. https://datafu.apache.org/. [Online; accessed 23-January-2018].

Dunning, Ted. 2018. The t-digest Library. https://github.com/tdunning/t-digest/. [Online; accessed 23-January-2018].

Greenwald, Michael, & Khanna, Sanjeev. 2001. Space-e cient Online Computation of Quantile Summaries. Pages 58{66 of: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. SIGMOD '01. New York, NY, USA: ACM.

Lib, Stream. 2019. Stream summarizer and cardinality estimator. https://github.com/ addthis/stream-lib. [Online; accessed 11-February-2019].

Munro, J.I., & Paterson, M.S. 1980. Selection and sorting with limited storage. Theoretical Computer Science, 12(3), 315 { 323. [OpenAIRE]

Pike, Rob, Dorward, Sean, Griesemer, Robert, & Quinlan, Sean. 2005. Interpreting the Data: Parallel Analysis with Sawzall. Scienti c Programming Journal, 13, 277{298. [OpenAIRE]

Shrivastava, Nisheeth, Buragohain, Chiranjeeb, Agrawal, Divyakant, & Suri, Subhash. 2004 (09). Medians and Beyond: New Aggregation Techniques for Sensor Networks. In: SenSys'04 - Proceedings of the Second International Conference on Embedded Networked Sensor Systems. [OpenAIRE]

Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue