Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
versions View all 7 versions
addClaim

Data Citation Corpus Data File

Abstract

Data file for the fourth release of the Data Citation Corpus, produced by DataCite and Make Data Count as part of an ongoing grant project funded by the Wellcome Trust. Read more about the project. The data file includes 10,697,745 data citation records (of which 9,682,257 represent unique dataset-publication pairs) in JSON and CSV formats. The JSON file is the version of record. Data is provided in batches of approximately 1 million records each. The publication date and batch number are included in the file name, ex: 2027-07-27-data-citation-corpus-01-v4.0.json. The data citations in the file originate from the following sources: DataCite Event Data Chan Zuckerberg Initiative (CZI) Science Knowledge Graph Aligning Science Across Parkinson’s (ASAP) Europe PMC Each data citation record is comprised of: A pair of identifiers: An identifier for the dataset (a DOI or an accession number) and the DOI of the publication (journal article or preprint) in which the dataset is cited Metadata for the cited dataset and for the citing publication The data file includes the following fields: Field Description Required? id Internal identifier for the citation Yes created Date of item's incorporation into the corpus Yes updated Date of item's most recent update in corpus Yes repository Repository where cited data is stored No publisher Publisher for the article citing the data No journal Journal for the article citing the data No title Title of cited data No publication DOI of article where data is cited Yes dataset DOI or accession number of cited data Yes publishedDate Date when citing article was published No source Source where citation was harvested Yes subjects Subject information for cited data No affiliations Affiliation information for creator of cited data No funders Funding information for cited data No Additional documentation about the citations and metadata in the file is available on the Make Data Count website. Notes on v4.0: The fourth release of the Data Citation Corpus data file adds new citations from the following sources: 5.2 million data citations from Europe PMC identified as "eupmc" in the source field. Ingest of these citations was performed 9 July 2025. 139,647 data citations from DataCite Event Data for the period 1 January 2025 through 30 June 2025. This release also includes the following new metadata enhancements: Affiliation information for cited data from the Gene Expression Omnibus (GEO) repository, reonciled to Research Organization Registry (ROR) IDs where possible. Reconciliation of organization and funders names with the Research Organization Registry (ROR) for new citations from Event Data. Application of Field of Science subject terms to citation records originating from Europe PMC, based on disciplinary area of data repository. Additional details about the above changes, including scripts used to perform the above tasks, are available in GitHub. Additional enhancements to the corpus are ongoing and will be addressed in the course of subsequent releases. Users are invited to submit feedback via GitHub. For general questions, email info@makedatacount.org.

Keywords

research evaluation, datacite, data metrics, data usage, data citation, open infrastructure, open data, make data count, research data

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average