Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Research . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Research . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2021
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Research . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2021
License: CC BY
Data sources: ZENODO
versions View all 3 versions
addClaim

Cleaning different types of DOI errors found in cited references on Crossref using automated methods

Authors: Boente, Ricarda; Massari, Arcangelo; Santini, Cristian; Tural, Deniz;

Cleaning different types of DOI errors found in cited references on Crossref using automated methods

Abstract

{"references": ["Boente, R., Massari, A., Santini, C., & Tural, D. (2021a). Classes of errors in DOI names (Data Management Plan) (Version 5). Zenodo. https://doi.org/10.5281/zenodo.4733919", "Boente, R., Massari, A., Santini, C., & Tural, D. (2021b). Protocol: Investigating DOIs classes of errors. protocols.io. https://dx.doi.org/10.17504/protocols.io.buuknwuw", "Boente, R., Massari, A., Santini, C., & Tural, D. (2021). Classes of errors in DOI names: output dataset (Version v1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4892551", "Bostock, M. (2021). D3: Data-Driven Documents. Software Heritage. https://archive.softwareheritage.org/swh:1:dir:35fe697ae5a21e96d9fc01d890b30010e23c16dd", "Buchanan, R. A. (2006). Accuracy of cited references: The role of citation databases. College and Research Libraries, 67(4), 292\u2013303. https://doi.org/10.5860/crl.67.4.292", "Cioffi, A., Coppini, S., Moretti, A., & Shahidzadeh A.N. (2021, May 3). Investigating missing citations in COCI and publishers involved (Version First). Zenodo. http://doi.org/10.5281/zenodo.4735636", "Crossref. (2021). January 2021 Public Data File from Crossref. https://doi.org/10.13003/GU3DQMJVG4", "Domanskyi, S., Szedlak, A., Hawkins, N. T., Wang, J., Paternostro, G., Piermarocchi, C. (2019). bioRxiv 539833. https://doi.org/10.1101/539833", "Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2015). Errors in DOI indexing by bibliometric databases. Scientometrics, 102(3), 2181\u20132186. https://doi.org/10.1007/s11192-014-1503-4", "Garc\u00eda-Alonso, C.R., P\u00e9rez-Naranjo, L.M. & Fern\u00e1ndez-Caballero, J.C. (2014). Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms. Ann Oper Res 219, 187\u2013202. https://doi.org/10.1007/s10479-011- 0841-3", "Heibi, I., Peroni, S., & Shotton, D. (2019). Software review: COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations. Scientometrics, 121(2), 1213\u20131228. https://doi.org/10.1007/s11192-019-03217-6", "International DOI Foundation. (2019). DOI\u00ae Handbook. https://doi.org/10.1000/182", "Krebs, S.L. (2018) Rhododendron. In: Van Huylenbroeck J. (eds) Ornamental Crops. Handbook of Plant Breeding, vol 11. Springer, Cham. https://doi.org/10.1007/978-3-319-90698-0_26", "Massari, A., Santini, C., & Boente, R. (2021). open-sci/2020-2021-grasshoppers-code: Classes of errors in DOI names (Version 1.1.0). Zenodo. https://doi.org/10.5281/zenodo.4723983", "Peroni, S. (2021). Citations to invalid DOI-identified entities obtained from processing DOI-to-DOI citations to add in COCI [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4625300", "Wang, S., Van Huylenbroeck, J. and Zhang, L.-H. (2020). Adaptability of Rhododendron species to climate and growth conditions at Lushan Botanical Garden. Acta Hortic. 1288, 131-138. https://doi.org/10.17660/ActaHortic.2020.1288.20", "Xu, S., Hao, L., An, X., Zhai, D., & Pang, H. (2019). Types of DOI errors of cited references in Web of Science with a cleaning method. Scientometrics, 120(3), 1427\u20131437. https://doi.org/10.1007/s11192-019-03162-4", "Zhu, J., Hu, G. & Liu, W. DOI errors and possible solutions for Web of Science. Scientometrics 118, 709\u2013718 (2019). https://doi.org/10.1007/s11192-018-2980-7"]}

Abstract Purpose The purpose of this work is to find an automated process to repair invalid DOI names that have been collected by Silvio Peroni while processing data provided by Crossref (2021). Design / methodology / approach The data needed for this research is provided as a CSV list containing more than 1 million invalid cited DOI names. First, to determine an automated process, the errors that characterize the wrong DOI names in the list need to be classified. Concentrating exclusively on the factual errors, such as additional or invalid characters, the DOI names that have become valid in the meantime can be removed. Then, a classification of those factual errors as prefix-, suffix- or other-type errors is proposed. By closer investigation and extension of already existing research in this field, this research classifies regular expressions that can be used to clean the different types of invalid DOI names: for example, by deleting additional strings at the end or the beginning. After the cleanup, the cleaned DOI names are checked for their validity again. Findings This research was able to find automated processes based on regular expressions and correct the factual errors belonging to different subclasses. Applying the proposed algorithm to the mentioned dataset, around 16% of the DOI names proved valid afterwards. The largest part of those valid DOIs consists of those made valid by cleaning up suffix errors; however, many DOIs also proved valid without cleaning, being only temporarily invalid. Research limitations / implications Checking if the DOI names are valid either consumes a lot of time or a high amount of RAM, since the process should be executed before and after the cleaning. Therefore, the described methods are only applicable on smaller datasets, unless the availability of the necessary resources is ensured. Also, there will always remain DOI names that cannot be made valid using automated processes. In these cases, it is important to find the publishers responsible for the incorrect references, which is done in a separate related project (Cioffi et al., 2021). Originality / value Building on existing research, this study extends and improves regular expressions targeted to clean DOI errors, to enhance the data quality in the COCI dataset. As the COCI project provides open access to reference lists of scientific works, the whole academic community can profit from this improvement in data quality. In addition, the methods submitted could be the base for further research in this field, allowing the correction of DOI name errors in other datasets, too.

Keywords

Crossref, invalid DOIs, open citations, OpenCitations, COCI

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 7
    download downloads 6
  • 7
    views
    6
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
1
Average
Average
Average
7
6
Green