<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
This dataset contains the Pointwise Mutual Information (PMI) values for co-occurrence pairs between different mention categories extracted from two distinct clinical datasets: MESINESP2 and the Clinical Case Reports Collection. PMI is a statistical measure used to assess the strength of association between pairs of entities by comparing their observed co-occurrence to the expected frequency under the assumption of independence. The datasets include PMI values for each co-occurrence pair, derived from the association of professions and clinical concepts, with the aim of identifying potential occupational health risks. By sharing these datasets, we aim to support further research into the relationships between professions and clinical entities, enabling the development of more accurate and targeted occupational health risk models. There is a separate file for each corpus, and each dataset is provided in CSV format for easy access and analysis. These files include the PMI values for co-occurrence pairs extracted from the respective corpora, making them suitable for further data analysis. Data Structure: MESINESP2: mesinesp2_co-occurrence_pmi.zip Clinical case reports: clinical_cases_co-occurrence_pmi.zip The repository contains a .zip file for each of the corpus, each containing a .csv file with the co-occurrences between the detected professions and clinical entities. The file has the following columns order: span_mention_1: Mention string (original): profession normalized_entity_1: Controlled vocabulary entry for this term mention1_category: Semantic class (i.e., NER label) mention1_freq: Absolute frequency of this mention entity 1 span_mention_2: Mention string (original): entity 2 (disease, symptom, species, etc.) normalized_entity_2: Controlled vocabulary entry for this term mention2_category: Semantic class (i.e., NER label) mention1_freq: Absolute frequency of this mention entity 2 co-occurrence: Number of co-occurrences PMID: PMID value Notes This resource been funded by the Spanish National Proyectos I+D+i 2020 AI4ProfHealth project PID2020-119266RA-I00 (PID2020-119266RA-I0/AEI/10.13039/501100011033). Contact If you have any questions or suggestions, please contact us at: - Miguel Rodríguez Ortega ()- Martin Krallinger () Additional resources and corpora If you are interested, you might want to check out these corpora and resources: MEDDOPROF (Corpus of mentions of professions, occupations and working status and normalization, different document collection with some overlapping documents) MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection)
This resource been funded by the Spanish National Proyectos I+D+i 2020 AI4ProfHealth project PID2020-119266RA-I00 (PID2020-119266RA-I0/AEI/10.13039/501100011033).
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |