LivingNER corpus: recognition and normalization of species

LivingNER corpus - training and validation sets The LivingNER corpus is a collection of 2000 clinical cases from over 10 different medical areas annotated with SPECIES mentions, that are mapped to NCBI Taxonomy. It is used for the LivingNER Shared Task on occupations and employment status detection and normalization in Spanish medical documents, which will be celebrated as part of IberLEF 2022. The training set is composed of 1000 clinical cases extracted from miscellaneous medical specialties including COVID, oncology, infectious diseases, tropical medicine, urology, pediatrics, and others. The files are distributed as follows: - For subtask 1 (LivingNER-Species NER track), annotations are distributed in a tab-separated file (TSV) file with the following columns: filename: document name mark: identifier mention mark label: mention type (SPECIES or HUMAN) off0: starting position of the mention in the document off1: ending position of the mention in the document span: textual span - For subtask 2 (LivingNER-Species Norm track), annotations are distributed in a TSV file with the same columns as the previous one, plus: isH: whether the span is narrower than the NCBITax assigned code isN: whether the mention corresponds to a nosocomial infection iscomplex: whether the span has assigned a combination of NCBITax codes NCBITax: mention code in the NCBI Taxonomy - For subtask 3 (LivingNER-Clinical IMPACT track), annotations are distributed in a (TSV). In this version of the dataset, the data for this subtask is pending. All text files are distributed as plain UTF-8 text files. Resources Web Annotation guidelines Evaluation library LivingNER terminology For further information, please visit https://temu.bsc.es/livingner/ or email us at encargo-pln-life@bsc.es

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).

Related Organizations

Barcelona Supercomputing Center
Spain

Keywords

normalization, NER, gold standard, species, corpus, NLP

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average