
The OAMEDXMLC dataset comprises 869'402 scientific documents, publications that are related to Surgery. It includes labeled, annotated data such as various surgery categories, domains related to the documents, authors, year of publication and references to other documents. With the help of those annotations, example tasks that can be trained using this dataset include: Document tagging or classification among a large amount of categories (extreme multi-label classification, or XMLC) Authors prediction Year of publication prediction Reference/link prediction Note that this is an extension of the OAXMLC dataset https://zenodo.org/records/15309916 Importantly, this dataset is equipped with two independent taxonomies and set of labels, opening multiple possibilities, including Principled investigation of the influence of taxonomies on XML algorithms Transfer learning in XMLC (from one taxonomy to the other) Each taxonomy is provided both in a turtle/SKOS format, as well as in a json/txt format for easier XMLC usage. The dataset was built with data coming from the OpenAlex[OpenAlex](https://openalex.org/) open catalog. More detail can be found in the README.md file as well as in the original dataset https://zenodo.org/records/15309916
ontology, extreme multi-label classification
ontology, extreme multi-label classification
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
