
This is a benchmark dataset for semantic shift detection in disability-related corpora, including collected title and abstract text from PubMed and ArXiv, annotation sets based on domain experts and LLMs, and extracted KGs (Wikidata entity claims). The corpus from PubMed covers the period from the 1900s to 2023, while the corpus from ArXiv covers the period from the 1990s to 2023. The corpus was filtered based on 16 disability-related target words. In the annotation sets, '1' indicates that a semantic shift occurred for a target word, while '0' indicates the opposite. In particular, the LLM-based annotation sets include their generated text, and we used the Llama2 and GPT-4 models. '7b' refers to the parameter size of the Llama2 model. Graph_data.zip contains Wikidata entity claims.
Semantic shift detection, Historical corpus, Historical semantic drift detection, NLP
Semantic shift detection, Historical corpus, Historical semantic drift detection, NLP
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
