<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
This data set contains an algorithmic classification of research publications based on data from OpenAlex. The classification is based on the OpenAlex snapshot released on November 21, 2023. To build the classification, we used the so-called extended direct citation approach in combination with the Leiden algorithm. The source code of our software is available here. The classification covers the 71 million journal articles, proceedings papers, preprints, and book chapters in OpenAlex that were published between 2000 and 2023 and that are connected to each other by citation links. Based on 1715 million citation links, we built a three-level hierarchical classification. Each publication was assigned to a cluster at each of the three levels of the classification. Clusters consist of publications that are relatively strongly connected by citation links and that can therefore be expected to be topically related. At each level of the classification, a publication was assigned to only one cluster, which means clusters do not overlap. The classification consists of 4521 micro clusters at the lowest (most granular) level, 917 meso clusters at the middle level, and 20 macro clusters at the highest (least granular) level. We also algorithmically linked each cluster in the classification to one or more of the following five broad main fields: biomedical and health sciences, life and earth sciences, mathematics and computer science, physical sciences and engineering, and social sciences and humanities. We used the Updated GPT 3.5 Turbo large language model, developed by OpenAI, to label the 4521 micro clusters at the lowest level in the classification. The source code of our software can be found here. See this blog post for more information about the classification. The classification, including the labels of the micro clusters, is available in the following tab-delimited files. clustering.tsv work_id doi macro_cluster_id meso_cluster_id micro_cluster_id main_field.tsv main_field_id main_field macro_cluster.tsv macro_cluster_id macro_cluster n_works macro_cluster_main_field.tsv macro_cluster_id main_field_seq main_field_id weight is_primary_main_field meso_cluster.tsv meso_cluster_id meso_cluster parent_macro_cluster_id n_works meso_cluster_main_field.tsv meso_cluster_id main_field_seq main_field_id weight is_primary_main_field meso_cluster_source.tsv meso_cluster_id source_seq source_id n_works micro_cluster.tsv micro_cluster_id micro_cluster short_label long_label keywords summary wikipedia_url parent_macro_cluster_id parent_meso_cluster_id n_works micro_cluster_main_field.tsv micro_cluster_id main_field_seq main_field_id weight is_primary_main_field micro_cluster_keyword.tsv micro_cluster_id keyword_seq keyword micro_cluster_source.tsv micro_cluster_id source_seq source_id n_works
publication, classification, bibliometrics, cluster, scientometrics, OpenAlex
publication, classification, bibliometrics, cluster, scientometrics, OpenAlex
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |