
AbstractThe number of publications in biomedicine and life sciences has rapidly grown over the last decades, with over 1.5 million papers now being published every year. This makes it difficult to keep track of new scientific works and to have an overview of the evolution of the field as a whole. Here we present a 2D map of the entire corpus of biomedical literature, and argue that it provides a unique and useful overview of the life sciences research. We based our atlas on the abstract texts of 21 million English articles from the PubMed database. To embed the abstracts into 2D, we used the large language model PubMedBERT, combined witht-SNE tailored to handle samples of our size. We used our atlas to study the emergence of the Covid-19 literature, the evolution of the neuroscience discipline, the uptake of machine learning, the distribution of gender imbalance in academic authorship, and the distribution of retracted paper mill articles. Furthermore, we present an interactive web version of our atlas that allows easy exploration and will enable further insights and facilitate future research.
Article
Article
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 34 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
