
We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to demonstrate a decreasing marginal need for new words, a feature that is likely related to the underlying correlations between words. We calculate the annual growth fluctuations of word use which has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This "cooling pattern" forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature.
9 two-column pages, 7 figures; accepted for publication in Scientific Reports
FOS: Computer and information sciences, Physics - Physics and Society, Urban-growth patterns, družba, Population, FOS: Physical sciences, physics of social systems, Physics and Society (physics.soc-ph), Insights, Statistics - Applications, Multidisciplinary sciences, Article, Centuries, Size, kultura, fizika socialnih sistemov, kvantitativna lingvistika, Applications (stat.AP), Cities, Innovation, info:eu-repo/classification/udc/536.91, Condensed Matter - Statistical Mechanics, Computer Science - Computation and Language, Statistical Mechanics (cond-mat.stat-mech), Q Science (General), culturomics, 400, culture, Physical sciences, society, Zipf's law, Biochemistry and cell biology, Long-range correlations, Distributions, quantitative linguistics, Science & technology, Computation and Language (cs.CL), Z665 Library Science. Information Science
FOS: Computer and information sciences, Physics - Physics and Society, Urban-growth patterns, družba, Population, FOS: Physical sciences, physics of social systems, Physics and Society (physics.soc-ph), Insights, Statistics - Applications, Multidisciplinary sciences, Article, Centuries, Size, kultura, fizika socialnih sistemov, kvantitativna lingvistika, Applications (stat.AP), Cities, Innovation, info:eu-repo/classification/udc/536.91, Condensed Matter - Statistical Mechanics, Computer Science - Computation and Language, Statistical Mechanics (cond-mat.stat-mech), Q Science (General), culturomics, 400, culture, Physical sciences, society, Zipf's law, Biochemistry and cell biology, Long-range correlations, Distributions, quantitative linguistics, Science & technology, Computation and Language (cs.CL), Z665 Library Science. Information Science
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 171 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
