
Programming languages have evolved rapidly over the past five decades, reflecting broader shifts in software development practices and technological advances. Early on, entities like the U.S. Department of Defense recognized the challenges posed by diverse programming languages, leading to initiatives such as the Ada programming language. Since then, indexes like Tiobe, RedMonk, and Open Hub have attempted to track language popularity, though their metrics provide only a snapshot view, and most of them do not make available their data. We show that Software Heritage, the largest public archive of source code, makes it now possible, and easy, to address this question in a comprehensive, transparent and reproducible manner through its unified dataset, which includes over 20 billion source files and 4 billion commits. As a result of our study, we have created a dataset and pipeline that allows to analyze five decades of programming language trends, by measuring the programming activity as seen in the Software Heritage archive, confirming trends in language adoption, shifts in popularity, and significant transitions linked to technological changes. The comparison with the existing indexes shows rather good alignment for the first positions in the rankings, but differences emerge down the line, as programmer activity, and language popularity are not necessary aligned. To facilitate further research on programming language evolution, we publish the whole software pipeline as Open Source, and make available the full dataset, that will be updated biannually.
[INFO] Computer Science [cs]
[INFO] Computer Science [cs]
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
