
Overview This dataset contains 738,474 matched records linking arXiv preprints to their published counterparts. It is part of the COMET (Collaborative Metadata) initiative, specifically produced as a result of the matching strategy developed during COMET's pilot phase. Data Structure Each record contains the following fields: input_doi: The DOI of the ArXiv preprint (format: 10.48550/arxiv.XXXX.XXXXX) matched_doi: The DOI of the published work in Crossref that corresponds to the preprint confidence: A confidence score (0-1) indicating the reliability of the match matched_doi_type: The type of the matched publication (journal-article, proceedings-article, book-chapter, or report ) File Formats The dataset is available in two formats: CSV: 20250615_arxiv_preprint_matching_results.csv JSON: 20250615_arxiv_preprint_matching_results.json
arXiv, Metadata Matching, COMET, Preprints
arXiv, Metadata Matching, COMET, Preprints
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
