Downloads provided by UsageCounts
With large scale and complex configurable systems, it is hard for users to choose the right combination of options (i.e., configurations) in order to obtain the wanted trade-off between functionality and performance goals such as speed or size. Machine learning can help in relating these goals to the configurable system options, and thus, predict the effect of options on the outcome, typically after a costly training step. However, many configurable systems evolve at such a rapid pace that it is impractical to retrain a new model from scratch for each new version. Taking the extreme case of the Linux kernel with its ≈ 14, 500 configuration options, we investigate how binary size predictions of kernel size degrade over successive versions (and how transfer learning can be adapted and applied to mitigate this degradation). We used and are sharing a unique and large dataset constituted of the binary sizes (compressed and non-compressed) of thousands of configurations for different versions of the kernel, spanning three years (4.13, 4.15, 4.20, 5.0, 5.4, 5.7, and 5.8). Overall, around 200K configurations over 10K+ options/features and 6 versions. This dataset has been used in the Transactions of Software Engineering (TSE) article "Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size" (preprint: https://hal.inria.fr/hal-03358817)
works at least on pandas version : 1.4.3 (for the pickle)
Linux kernel, sampling, machine learning, software evolution, variability, software product lines, configurable systems
Linux kernel, sampling, machine learning, software evolution, variability, software product lines, configurable systems
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 44 | |
| downloads | 15 |

Views provided by UsageCounts
Downloads provided by UsageCounts