
This is a synthetic dataset of random numbers in variable-length, nested data structures in three file formats: ROOT TTree, Parquet, and Avro. There are four levels of depth: jagged0: not nested; just a flat array of numbers jagged1: an array of lists of numbers jagged2: an array of lists of lists of numbers jagged3: an array of lists of lists of lists of numbers The TBasket sizes of the TTree files and the row group sizes of the Parquet files were made to be identical, so that performances can be meaningfully compared. All of the files are compressed with ZLIB level 9. This dataset was first used in a performance study at CHEP 2019: presentation page published proceedings But it has since been used in other studies, such as this one at CHEP 2021: presentation page published proceedings and this one at ACAT 2022: presentation page preprint (will be published) It has become a standard performance benchmark. The scripts that were used to create this synthetic dataset are in this repository directory, PR #19. Just one file, zlib9-jagged0.avro, had to be excluded to fit in this Zenodo record, but it is the easiest one to reconstruct from the others.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
