<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
The MUSIC-OpRA dataset offers valuable insights into the representation of uncertainty in scientific literature across various domains. Researchers and practitioners can use this dataset to study and analyze the variations of uncertainty expressions in scholarly discourse. This dataset contains sentences extracted from open access articles in a wide range of fields, covering both Science, Technology, and Medicine (STM); and Social Sciences and Humanities (SSH) and annotated with respect to uncertainty in science. The dataset is derived from PubMed, Scopus, Web of Science (WoS). It has been produced as part of the ANR InSciM (Modelling Uncertainty in Science) project. The sentences were annotated by two independent annotators following the annotation guide proposed by Ningrum and Atanassova (2024). The annotators were trained on the basis of an annotation guide and previously annotated sentences in order to guarantee the consistency of the annotations. Each sentence was annotated as expressing or not expressing uncertainty (Uncertainty and No Uncertainty).Sentences expressing uncertainty were then annotated along five dimensions: Reference , Nature, Context , Timeline and Expression. The dataset is provided in CSV format. The columns in the table are as follows: sentence_id: A unique internal identifier for each sentence. journal_name: The name of the journal in which the article was published. sampling_technique: Sampling method used to select the sentence. Two approaches were employed: CueMapping: Sentences were randomly selected based on occurrences of uncertainty cues from pre-defined lists (Bongelli et al., 2019; Chen et al., 2018; Hyland, 1996). Manual: Sentences were manually extracted by identifying uncertainty and non-uncertainty expressions in a subset of articles (two randomly selected articles per journal). article_title: The title of the article from which the sentence was extracted. document_id: The URL where the article is published. publication_year: The year the article was published. sentence: The text of the sentence. uncertainty: '1' if the sentence expresses uncertainty, and '0' otherwise. reference, nature, context, timeline, expression: annotations of the type of uncertainty according to the annotation framework proposed by Ningrum and Atanassova (2023). The annotation of each dimension in this dataset are in numeric format rather than textual. The mapping betwen textual and numeric labels is presented in the Table below. Dimension 1 2 3 4 5 Reference Author Former Both Nature Epistemic Aleatory Both Context Background Methods Res&Disc Conclusion Others Timeline Past Present Future Expression Quantified Unquantified For a more comprehensive understanding of the construction of the dataset, including the selection of journals, sampling procedure, and the annotation methodology, see Ningrum and Atanassova (2023); and Ningrum and Atanassova (2024). References Bongelli, R., Riccioni, I., Burro, R., & Zuczkowski, A. (2019). Writers’ uncertainty in scientific and popular biomedical articles. A comparative analysis of the British Medical Journal and Discover Magazine [Publisher: Public Library of Science]. PLoS ONE, 14 (9). https://doi.org/10.1371/journal.pone.0221933 Chen, C., Song, M., & Heo, G. E. (2018). A scalable and adaptive method for finding semantically equivalent cue words of uncertainty. Journal of Informetrics, 12 (1), 158–180. https://doi.org/10.1016/j.joi.2017.12.004 Hyland, K. E. (1996). Talking to the academy forms of hedging in science research articles [Publisher: SAGE Publications Inc.]. Written Communication, 13 (2), 251–281. https://doi.org/10.1177/0741088396013002004 Ningrum, P. K., & Atanassova, I. (2023). Scientific Uncertainty: An Annotation Framework and Corpus Study in Different Disciplines. 19th International Conference of the International Society for Scientometrics and Informetrics (ISSI 2023). https://doi.org/10.5281/zenodo.8306035 Ningrum, P. K., & Atanassova, I. (2024). Annotation of scientific uncertainty using linguistic patterns. Scientometrics. https://doi.org/10.1007/s11192-024-05009-z
multidimensional annotation, text classification, uncertainty annotation, research article, interdisciplinary, Uncertainty, text mining, scientific uncertainty, reference, scientometrics
multidimensional annotation, text classification, uncertainty annotation, research article, interdisciplinary, Uncertainty, text mining, scientific uncertainty, reference, scientometrics
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |