
We discuss algorithms for estimating the Shannon entropy h of finite symbol sequences with long range correlations. In particular, we consider algorithms which estimate h from the code lengths produced by some compression algorithm. Our interest is in describing their convergence with sequence length, assuming no limits for the space and time complexities of the compression algorithms. A scaling law is proposed for extrapolation from finite sample lengths. This is applied to sequences of dynamical systems in non-trivial chaotic regimes, a 1-D cellular automaton, and to written English texts.
FOS: Computer and information sciences, Measures of information, entropy, Computer Science - Computation and Language, Statistical Mechanics (cond-mat.stat-mech), Computer Science - Information Theory, Information Theory (cs.IT), FOS: Physical sciences, Machine Learning (stat.ML), Protein sequences, DNA sequences, Applications of dynamical systems, Statistics - Machine Learning, Physics - Data Analysis, Statistics and Probability, Dynamical aspects of cellular automata, Computation and Language (cs.CL), Condensed Matter - Statistical Mechanics, Data Analysis, Statistics and Probability (physics.data-an)
FOS: Computer and information sciences, Measures of information, entropy, Computer Science - Computation and Language, Statistical Mechanics (cond-mat.stat-mech), Computer Science - Information Theory, Information Theory (cs.IT), FOS: Physical sciences, Machine Learning (stat.ML), Protein sequences, DNA sequences, Applications of dynamical systems, Statistics - Machine Learning, Physics - Data Analysis, Statistics and Probability, Dynamical aspects of cellular automata, Computation and Language (cs.CL), Condensed Matter - Statistical Mechanics, Data Analysis, Statistics and Probability (physics.data-an)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 210 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
