Mining frequent itemsets in a stream

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2014 Netherlands, Belgium English Publisher:Elsevier BVJournal:Information Systems, volume 39, pages 233-255 (issn: 0306-4379,

Copyright policy )

Authors: Toon Calders; Nele Dexters; Joris J. M. Gillis; Bart Goethals;

doi: 10.1016/j.is.2012.01.005

handle: 10067/1147870151162165141 , 2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/154207 , 1942/13632

Mining frequent itemsets in a stream

- Summary
- Subjects
- Metrics

Abstract

Mining frequent itemsets in a datastream proves to be a difficult problem, as itemsets arrive in rapid succession and storing parts of the stream is typically impossible. Nonetheless, it has many useful applications; e.g., opinion and sentiment analysis from social networks. Current stream mining algorithms are based on approximations. In earlier work, mining frequent items in a stream under the max-frequency measure proved to be effective for items. In this paper, we extended our work from items to itemsets. Firstly, an optimized incremental algorithm for mining frequent itemsets in a stream is presented. The algorithm maintains a very compact summary of the stream for selected itemsets. Secondly, we show that further compacting the summary is non-trivial. Thirdly, we establish a connection between the size of a summary and results from number theory. Fourthly, we report results of extensive experimentation, both of synthetic and real-world datasets, showing the efficiency of the algorithm both in terms of time and space.

Countries

Netherlands, Belgium

Related Organizations

Eindhoven University of Technology
Netherlands
University of Antwerp
Belgium
Université Libre de Bruxelles
Belgium
Technical University Eindhoven
Netherlands
Hasselt University
Belgium

Keywords

Computer. Automation, Frequent itemset mining; Datastream; Theory; Algorithm; Experiments, Sciences exactes et naturelles

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	47
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%