Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao zbMATH Openarrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
zbMATH Open
Article
Data sources: zbMATH Open
https://doi.org/10.1109/ccc.20...
Article . 2003 . Peer-reviewed
Data sources: Crossref
SIAM Journal on Computing
Article . 2005 . Peer-reviewed
Data sources: Crossref
DBLP
Article . 2005
Data sources: DBLP
DBLP
Conference object
Data sources: DBLP
versions View all 5 versions
addClaim

The complexity of approximating the entropy

Authors: Tugkan Batu; Sanjoy Dasgupta; Ravi Kumar 0001; Ronitt Rubinfeld;

The complexity of approximating the entropy

Abstract

Summary: We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy. In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this case, we show that a \(\gamma\)-multiplicative approximation to the entropy can be obtained in \(O(n^{(1+\eta)/\gamma^2} \log n)\) time for distributions with entropy \(\Omega(\gamma/\eta)\), where \(n\) is the size of the domain of the distribution and \(\eta\) is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For the class of distributions to which our upper bound applies, we obtain a lower bound of \(\Omega(n^{1/(2\gamma^2)})\). We next consider a combined oracle model in which the algorithm has access to both the generation and the evaluation oracles of the distribution. In this model, significantly greater efficiency can be achieved: we present an algorithm for \(\gamma\)-multiplicative approximation to the entropy that runs in \(O((\gamma^2 \log^2{n})/(h^2 (\gamma-1)^2))\) time for distributions with entropy \(\Omega(h)\); for such distributions, we also show a lower bound of \(\Omega((\log n)/(h(\gamma^2-1)+\gamma^2))\). Finally, we consider two special families of distributions: those in which the probabilities of the elements decrease monotonically with respect to a known ordering of the domain, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.

Keywords

sublinear algorithms, Measures of information, entropy, entropy estimation, properties of distributions, property testing, Nonparametric inference, Sampling theory in information and communication theory, Approximation algorithms

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    52
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
52
Top 10%
Top 10%
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!