
handle: 11368/3037309
As technology evolves and electronic devices become widespread, the amount of data produced in the form of stream increases in enormous proportions. Data streams are an online source of data, meaning that it keeps producing data continuously. This creates the need for fast and reliable methods to analyse and extract information from these sources. Stream mining algorithms exist for this purpose, but the use of supervised machine learning is extremely limited in the stream domain since it is unfeasible to label every data instance requested to be processed. Tackling this problem, our paper proposes the use of active learning techniques for stream mining algorithms, specifically incremental Hoeffding trees-based. It is important to mention that the active learning techniques were implemented to match the stream mining constraints regarding low computational cost. We took advantage of the incremental tree original structure to avoid overburdening the original computational cost when selecting a label. In other words, the statistical strategy to grow each incremental tree has supported the execution of active learning. Using techniques of uncertainty sampling, we were able to drastically reduce the number of labels required at the cost of a very small reduction in accuracy. Particularly with Budget Entropy there was an average negative impact of accuracy about using only of samples labelled.
Hoeffding tree, Active learning, Active learning; Hoeffding trees; Stream mining, Stream mining
Hoeffding tree, Active learning, Active learning; Hoeffding trees; Stream mining, Stream mining
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
