Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

An Empirical Study of Container Image Configurations and Their Impact on Start Times (Container Image Data)

Authors: Straesser, Martin; Bauer, André; Leppich, Robert; Herbst, Nikolas; Chard, Kyle; Foster, Ian; Kounev, Samuel;

An Empirical Study of Container Image Configurations and Their Impact on Start Times (Container Image Data)

Abstract

Dataset with the container image metadata used for our IEEE/ACM CCGRID 2023 paper "An Empirical Study of Container Image Configurations and Their Impact on Start Times". Abstract of the paper: A core selling point of application containers is their fast start times compared to other virtualization approaches like virtual machines. Predictable and fast container start times are crucial for improving and guaranteeing the performance of containerized cloud, serverless, and edge applications. While previous work has investigated container starts, there remains a lack of understanding of how start times may vary across container configurations. We address this shortcoming by presenting and analyzing a dataset of approximately 200,000 open-source Docker Hub images featuring different image configurations (e.g., image size and exposed ports). Leveraging this dataset, we investigate the start times of containers in two environments and identify the most influential features. Our experiments show that container start times can vary between hundreds of milliseconds and tens of seconds in the same environment. Moreover, we conclude that no single dominant configuration feature determines a container's start time and that hardware and software parameters must be considered together for an accurate assessment. Dataset description: Our images dataset contains 200,986 entries with 21 features associated to each container image. In the following, we describe the meaning of each feature. Further information is available in OCI Image Specification and the Docker Run Documentation. Besides the 20 features grouped in the five categories below, each dataset entry has a image_id, which is used to uniquely identify the dataset entry. Features Metadata features (prefix: meta) meta_repo_digest : The repo digest is a SHA-256 hash which is used to uniquely identify and pull the image from Docker Hub meta_architecture : The CPU architecture which the binaries in the image are built to run on meta_os : The name of the operating system which the image is built to run on meta_docker_version : The Docker version used to built this image I/O stream features (prefix: io) io_attach_stdin : boolean setting to determine whether the console should be attached to the process stdin stream io_attach_stdout : boolean setting to determine whether the console should be attached to the process stdout stream io_attach_stderr : boolean setting to determine whether the console should be attached to the process stderr stream io_tty : boolean setting to determine whether the console should pretend to be a TTY when attached io_open_std_in : boolean setting to determine whether the process stdin stream should be kept open even if console not attached io_std_in_once : boolean setting to determine whether the process retrieved input from the stdin stream at least once Start command features (prefix: cmd) cmd_args : Length of list of arguments to use as the command to execute when the container starts cmd_envvars : Environment variables set per default when the container starts cmd_additional_args : Length of list for additional arguments to the containers entrypoint File system features (prefix: fs) fs_volumes : Number of volumes to create/use by default fs_size : Size of this image in bytes fs_virtual_size : Virtual size of this image in bytes (equals size) fs_graph_driver_name : Name of the image's graph driver fs_root_fs_type : Name of the file system type used in the image fs_layers : Number of root file system layers Networking features (prefix: net) net_ports : Number of ports to expose per default Dataset acquisition: The dataset has been acquired from Docker Hub using a web crawler. We used substring matches with the Docker Hub Explore function. As search strings, we used all letter combination with sizes 1 to 3, meaning that our first search string was 'a' and our last was 'zzz'. We included both results from the 'recently updated' and the 'most popular' selection. We came up with an initial list of 286,294 image names. We then tested we could pull and start these images once. These tests have been conducted from April to June 2022. We sorted out all images that were either not pullable or startable and retrieved all total of 200,986 valid images. In the following, we describe the error types that we encountered and that let to the removal of the causing image from the dataset: The image manifest was unknown when we tried to download it meaning that is has been renamed or deleted from the time when our web crawler was running The entrypoint command required a dependency that was missing in the image and therefore the container could not be started The image did not specify an entrypoint command and could therefore not be started The image declared an invalid root file system type The image had a malformed root file system The image configuration was incomplete and therefore not all required data could be obtained See also our CodeOcean capsule with the processing scripts for our paper: https://doi.org/10.24433/CO.4595026.v2

{"references": ["Straesser, Martin et al. (2023) An Extensive Analysis of Container Image Configurations and Their Impact on Start Times. In Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing.", "Straesser, Martin et al. (2023) An Extensive Analysis of Container Image Configurations and Their Impact on Start Times (Supplementary Materials). CodeOcean Capsule. Available online: https://doi.org/10.24433/CO.4595026.v2"]}

Related Organizations
Keywords

empirical study, start time, container, docker

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 34
    download downloads 3
  • 34
    views
    3
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
34
3