
Many supervised learning approaches that adapt to changes in data distribution over time (e.g., concept drift) have been developed. The majority of them assume that the data comes already preprocessed or that preprocessing is an integral part of a learning algorithm. In real-application tasks, data that comes from, e.g., sensor readings, is typically noisy, contain missing values, redundant features, and a very large part of model development efforts is devoted to data preprocessing. As data is evolving over time, learning models need to be able to adapt to changes automatically. From a practical perspective, automating a predictor makes little sense if preprocessing requires manual adjustment over time. Nevertheless, adaptation of preprocessing has been largely overlooked in research. In this paper, we introduce and address the problem of adaptive preprocessing. We analyze when and under what circumstances it is beneficial to handle adaptivity of preprocessing and adaptivity of the learning model separately. We present three scenarios where handling adaptive preprocessing separately benefits the final prediction accuracy and illustrate them using computational examples. As a result of our analysis, we construct a prototype approach for combining adaptive preprocessing with adaptive predictor online. Our case study with real sensory data from a production process demonstrates that decoupling the adaptivity of preprocessing and the predictor contributes to improving the prediction accuracy. The developed reference framework and our experimental findings are intended to serve as a starting point in systematic research of adaptive preprocessing mechanisms for adaptive learning with evolving data.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 43 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
