<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

MUSIC-OpRA: Multidimensional Uncertainty in Scientific Interdisciplinary Corpora for Open Research Article

Name: MUSIC-OpRA: Multidimensional Uncertainty in Scientific Interdisciplinary Corpora for Open Research Article
Keywords: multidimensional annotation, text classification, uncertainty annotation, research article, interdisciplinary, Uncertainty, text mining, scientific uncertainty, reference, scientometrics

Research datakeyboard_double_arrow_right Dataset 08 Apr 2025Embargo end date: 01 Jan 2025 English Publisher:Zenodo

Authors: Ningrum, Panggih Kusuma; Atanassova, Iana; Gutehrlé, Nicolas;

doi: 10.5281/zenodo.15173356 , 10.5281/zenodo.15173355

MUSIC-OpRA: Multidimensional Uncertainty in Scientific Interdisciplinary Corpora for Open Research Article

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

The MUSIC-OpRA dataset offers valuable insights into the representation of uncertainty in scientific literature across various domains. Researchers and practitioners can use this dataset to study and analyze the variations of uncertainty expressions in scholarly discourse. This dataset contains sentences extracted from open access articles in a wide range of fields, covering both Science, Technology, and Medicine (STM); and Social Sciences and Humanities (SSH) and annotated with respect to uncertainty in science. The dataset is derived from PubMed, Scopus, Web of Science (WoS). It has been produced as part of the ANR InSciM (Modelling Uncertainty in Science) project. The sentences were annotated by two independent annotators following the annotation guide proposed by Ningrum and Atanassova (2024). The annotators were trained on the basis of an annotation guide and previously annotated sentences in order to guarantee the consistency of the annotations. Each sentence was annotated as expressing or not expressing uncertainty (Uncertainty and No Uncertainty).Sentences expressing uncertainty were then annotated along five dimensions: Reference , Nature, Context , Timeline and Expression. The dataset is provided in CSV format. The columns in the table are as follows: sentence_id: A unique internal identifier for each sentence. journal_name: The name of the journal in which the article was published. sampling_technique: Sampling method used to select the sentence. Two approaches were employed: CueMapping: Sentences were randomly selected based on occurrences of uncertainty cues from pre-defined lists (Bongelli et al., 2019; Chen et al., 2018; Hyland, 1996). Manual: Sentences were manually extracted by identifying uncertainty and non-uncertainty expressions in a subset of articles (two randomly selected articles per journal). article_title: The title of the article from which the sentence was extracted. document_id: The URL where the article is published. publication_year: The year the article was published. sentence: The text of the sentence. uncertainty: '1' if the sentence expresses uncertainty, and '0' otherwise. reference, nature, context, timeline, expression: annotations of the type of uncertainty according to the annotation framework proposed by Ningrum and Atanassova (2023). The annotation of each dimension in this dataset are in numeric format rather than textual. The mapping betwen textual and numeric labels is presented in the Table below. Dimension 1 2 3 4 5 Reference Author Former Both Nature Epistemic Aleatory Both Context Background Methods Res&Disc Conclusion Others Timeline Past Present Future Expression Quantified Unquantified For a more comprehensive understanding of the construction of the dataset, including the selection of journals, sampling procedure, and the annotation methodology, see Ningrum and Atanassova (2023); and Ningrum and Atanassova (2024). References Bongelli, R., Riccioni, I., Burro, R., & Zuczkowski, A. (2019). Writers’ uncertainty in scientific and popular biomedical articles. A comparative analysis of the British Medical Journal and Discover Magazine [Publisher: Public Library of Science]. PLoS ONE, 14 (9). https://doi.org/10.1371/journal.pone.0221933 Chen, C., Song, M., & Heo, G. E. (2018). A scalable and adaptive method for finding semantically equivalent cue words of uncertainty. Journal of Informetrics, 12 (1), 158–180. https://doi.org/10.1016/j.joi.2017.12.004 Hyland, K. E. (1996). Talking to the academy forms of hedging in science research articles [Publisher: SAGE Publications Inc.]. Written Communication, 13 (2), 251–281. https://doi.org/10.1177/0741088396013002004 Ningrum, P. K., & Atanassova, I. (2023). Scientific Uncertainty: An Annotation Framework and Corpus Study in Different Disciplines. 19th International Conference of the International Society for Scientometrics and Informetrics (ISSI 2023). https://doi.org/10.5281/zenodo.8306035 Ningrum, P. K., & Atanassova, I. (2024). Annotation of scientific uncertainty using linguistic patterns. Scientometrics. https://doi.org/10.1007/s11192-024-05009-z

Related Organizations

UNIVERSITE MARIE ET LOUIS PASTEUR
France

Keywords

multidimensional annotation, text classification, uncertainty annotation, research article, interdisciplinary, Uncertainty, text mining, scientific uncertainty, reference, scientometrics

1 Research products, page 1 of 1

AURORA-MESS: Annotated Uncertainty and Reference in Open Research Articles for Multidisciplinary and Empirical Social Science
2025Continues

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

MUSIC-OpRA: Multidimensional Uncertainty in Scientific Interdisciplinary Corpora for Open Research Article

MUSIC-OpRA: Multidimensional Uncertainty in Scientific Interdisciplinary Corpora for Open Research Article

1 Research products, page 1 of 1

AURORA-MESS: Annotated Uncertainty and Reference in Open Research Articles for Multidisciplinary and Empirical Social Science