Downloads provided by UsageCounts
This dataset allows training and evaluating methods for the identification of focus versus background entities in scientific literature. A focus entity is an entity being actively research in a publication while a background entity is an entity that is being discussed in a publication but is not the main focus of the publication. The dataset has been generated automatically using the MeSH indexing of MEDLINE as reference. The entities of interest in this dataset are microbial pathogens. Entities were annotated using a dictionary approach and then the MeSH indexing of the MEDLINE citation linked to the publication was used to determine the relevance of the entity as focus or background entity. There are two main types of datasets, one generated from MEDLINE (files medline.*) and another one generated using full text articles from PubMed Central articles (PMC) (files pmc.*). The data sets are split into training and test, which we used in our research. All fields within the files are separated using the pipe "|" character. The MEDLINE citation dataset contains data from over 1M citations while the PMC dataset from over 100k publications (which is a subset of the MEDLINE dataset). In each row in the dataset files, the pathogen of interest has been replaced by the text @PATHOGEN$ and there might be several references of the pathogen in the same row. Full text articles datasets have been further split into a dataset with explicit separation between sections and another one in which all the full text article appears in one single text string and section names appear at the beginning of each section.
pathogen characterisation, focus entities identification, biomedical literature
pathogen characterisation, focus entities identification, biomedical literature
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 3 | |
| downloads | 1 |

Views provided by UsageCounts
Downloads provided by UsageCounts