
The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud ComputingThis repository accompanies the paper: Arno Uhlig, Iris Braun, Matthias Wählisch The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing.* Proceedings of the ACM Internet Measurement Conference (IMC ’25), October 28–31, 2025, Madison, WI, USA. DOI: 10.1145/3730567.3764480If you use this dataset, please cite our paper and dataset:@inproceedings{uhlig2025sapdataset, author = {Arno Uhlig, Iris Braun, Matthias W\"ahlisch}, title = {The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing}, booktitle = {Proceedings of the 2025 ACM Internet Measurement Conference (IMC '25)}, year = {2025}, doi = {10.1145/3730567.3764480}}@dataset{zenodo17141306, author = {Arno Uhlig and Iris Braun and Matthias W\"ahlisch}, title = {The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing}, year = {2025}, publisher = {Zenodo}, doi = {10.5281/zenodo.17141306}, url = {https://doi.org/10.5281/zenodo.17141306}}If you have questions, please contact: - Arno Uhlig – arno.uhlig@sap.com- Iris Braun – iris.braun@tu-dresden.de- Matthias Wählisch – m.waehlisch@tu-dresden.de---OverviewThis repository provides data and artifacts used in the paper, including:- Telemetry data from ~1,800 hypervisors and 48,000 VMs over a 30-day observation period - Resource utilization metrics (CPU, memory, network, storage) - Scheduling-relevant events (creation, migration, resize, deletion) - Scripts for preprocessing, analysis, and visualization The dataset captures real-world enterprise workloads and enables reproducible research on VM placement and scheduling in large-scale cloud environments.---Repository Structure .├── data/ # Raw, anonymized datasets├── scripts/ # Analysis and visualization scripts└── LICENSE # License fileGetting Started - Requirements - Python ≥ 3.10 - See the [requirements.txt](./requirements.txt)- For large-scale analysis: sufficient memory and storageLicense This dataset and accompanying material are released under the **Creative Commons Attribution 4.0 International License (CC BY 4.0)**. See [LICENSE](./LICENSE) for details.
workload management, virtual machines, scheduling, placement
workload management, virtual machines, scheduling, placement
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
