
arXiv: 1707.04285
AbstractA set of data with positive values follows a Pareto distribution if the log–log plot of value versus rank is approximately a straight line. A Pareto distribution satisfies Zipf’s law if the log–log plot has a slope of $-1$. Since many types of ranked data follow Zipf’s law, it is considered a form of universality. We propose a mathematical explanation for this phenomenon based on Atlas models and first-order models, systems of strictly positive continuous semimartingales with parameters that depend only on rank. We show that the stationary distribution of an Atlas model will follow Zipf’s law if and only if two natural conditions, conservation and completeness, are satisfied. Since Atlas models and first-order models can be constructed to approximate systems of time-dependent rank-based data, our results can explain the universality of Zipf’s law for such systems. However, ranked data generated by other means may follow non-Zipfian Pareto distributions. Hence, our results explain why Zipf’s law holds for word frequency, firm size, household wealth, and city size, while it does not hold for earthquake magnitude, cumulative book sales, and the intensity of wars, all of which follow non-Zipfian Pareto distributions.
Physics - Physics and Society, General Economics (econ.GN), FOS: Physical sciences, Applications of stochastic analysis (to PDEs, etc.), Pareto distribution, Physics and Society (physics.soc-ph), Mathematical geography and demography, Atlas model, FOS: Economics and business, Zipf's law, first-order model, Statistical methods; risk measures, Economics - General Economics
Physics - Physics and Society, General Economics (econ.GN), FOS: Physical sciences, Applications of stochastic analysis (to PDEs, etc.), Pareto distribution, Physics and Society (physics.soc-ph), Mathematical geography and demography, Atlas model, FOS: Economics and business, Zipf's law, first-order model, Statistical methods; risk measures, Economics - General Economics
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
