Build and measurements of Linux kernel configurations across different versions

With large scale and complex configurable systems, it is hard for users to choose the right combination of options (i.e., configurations) in order to obtain the wanted trade-off between functionality and performance goals such as speed or size. Machine learning can help in relating these goals to the configurable system options, and thus, predict the effect of options on the outcome, typically after a costly training step. However, many configurable systems evolve at such a rapid pace that it is impractical to retrain a new model from scratch for each new version. Taking the extreme case of the Linux kernel with its ≈ 14, 500 configuration options, we investigate how binary size predictions of kernel size degrade over successive versions (and how transfer learning can be adapted and applied to mitigate this degradation). We used and are sharing a unique and large dataset constituted of the binary sizes (compressed and non-compressed) of thousands of configurations for different versions of the kernel, spanning three years (4.13, 4.15, 4.20, 5.0, 5.4, 5.7, and 5.8). Overall, around 200K configurations over 10K+ options/features and 6 versions. This dataset has been used in the Transactions of Software Engineering (TSE) article "Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size" (preprint: https://hal.inria.fr/hal-03358817)

works at least on pandas version : 1.4.3 (for the pickle)

Related Organizations

University of Rennes 1
France
Pontifical Catholic University of Rio de Janeiro
Brazil
French National Centre for Scientific Research
France

Keywords

Linux kernel, sampling, machine learning, software evolution, variability, software product lines, configurable systems

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average