Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

ArXiv OAI-PMH arXivRaw publication metadata

Authors: Druskat, Stephan;

ArXiv OAI-PMH arXivRaw publication metadata

Abstract

This dataset contains OAI-PMH metadata for all ArXiv publications up until 2024-04-23 in the arXivRaw XML format. The metadata has been harvested using the metha Go package v0.3.3 [1] on go1.18. Specifically, harvesting was run on a small HPC cluster using the following SLURM script. The script had to be scheduled twice due to the connection being reset by the peer (see combined-slurm.out). metha caters for these situations and is able to pick up where it left off with cumulative harvesting. #!/bin/bash #SBATCH --job-name=metha #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks=1 #SBATCH --time=10-20:00:00 module purge echo "Installing Go module." module add go/go-1.18/go-1.18-gcc-9.4.0-okbjyoy echo "Installed Go module: $(go version)." echo "Installing metha." go install -v github.com/miku/metha/cmd/...@latest echo "Installed metha: $(/go/bin/metha-sync -v)" echo "Harvesting ArXiv OAI-PMH metadata in format 'arXivRaw' from http://export.arxiv.org/oai2." /go/bin/metha-sync -T 5m -base-dir /scratch//arxiv -format "arXivRaw" http://export.arxiv.org/oai2 # For the second run, '-from' was specified to pick up the harvest where it was left off. # /go/bin/metha-sync -from 2020-09-29 -T 5m -base-dir /scratch//arxiv -format "arXivRaw" http://export.arxiv.org/oai2 echo "Done." exit 0 Dataset contents This deposit of the dataset contains the following files: metha-output-OAI-PMH-arXivRaw-until-2024-03-24.tar.gz: an archive file containing the archive files (gzipped, *.xml.gz) produced by metha, which in turn contain the XML metadata files. The gzipped files contained in the archive are named following the pattern YYYY-MM-DD-.xml.gz, e.g., 2024-03-24-00000001.xml.gz. README.md: This file, containing basic information about the dataset and deposit. combined-slurm.out: The combined SLURM log for the two consecutive SLURM runs that have produced the dataset. Run-specific information has been retracted. Reproducibility As the OAI-PMH metadata is not static but may change at any time, this dataset isn't fully reproducible. However, running the same metha version on the same go version with the same commands should yield very similar results, but will contain newer metadata. Licenses All ArXiv OAI-PMH metadata is licensed under CC0-1.0. combined-slurm.out is licensed under CC0-1.0. README.md is licensed under CC0-1.0. Licenses are documented in a machine-readble manner following the REUSE 3.0 Specification. License deeds are included in this deposit as .txt files named using the respective SPDX license identifiers. [1] Martin Czygan, Thomas Gersch, ACz-UniBi, Justin Kelly, Gunnar Þór Magnússon, dvglc, & Natanael Arndt. (2024). miku/metha: v0.3.3 (v0.3.3). Zenodo. doi:10.5281/zenodo.10940212.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average