publication . Article . Other literature type . 2015

Conditional heavy hitters: detecting interesting correlations in data streams

Mirylenka, Katsiaryna; Cormode, Graham; Palpanas, Themis; Srivastava, Divesh;
Open Access
  • Published: 26 Feb 2015 Journal: The VLDB Journal, volume 24, pages 395-414 (issn: 1066-8888, eissn: 0949-877X, Copyright policy)
  • Publisher: Springer Science and Business Media LLC
  • Country: United Kingdom
Abstract
The notion of heavy hitters--items that make up a large fraction of the population--has been successfully used in a variety of applications across sensor and RFID monitoring, network data analysis, event mining, and more. Yet this notion often fails to capture the semantics we desire when we observe data in the form of correlated pairs. Here, we are interested in items that are conditionally frequent: when a particular item is frequent within the context of its parent item. In this work, we introduce and formalize the notion of conditional heavy hitters to identify such items, with applications in network monitoring and Markov chain modeling. We explore the rela...
Subjects
free text keywords: Hardware and Architecture, Information Systems, Data stream mining, Data type, Online algorithm, Database, computer.software_genre, computer, Markov chain, Network monitoring, Computer science, Data mining, Event mining, Network data, Semantics, QA76
Related Organizations
39 references, page 1 of 3

1. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 207-216 (1993) [OpenAIRE]

2. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: ACM Symposium on Theory of Computing, pp. 20-29 (1996)

3. Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: ACM Principles of Database Systems (2004) [OpenAIRE]

4. Baum, L.E., Petrie, T.: Statistical Inference for Probabilistic Functions of Finite State Markov Chains. The Annals of Mathematical Statistics 37(6), 1554-1563 (1966) [OpenAIRE]

5. Boyer, B., Moore, J.: A fast majority vote algorithm. Tech. Rep. ICSCA-CMP-32, Institute for Computer Science, University of Texas (1981)

6. Broder, A., Mitzenmacher, M.: Network applications of bloom filters: A survey. Internet Mathematics 1(4), 485-509 (2005)

7. Budak, C., Georgiou, T., Agrawal, D., El Abbadi, A.: Geoscope: Online detection of geo-correlated information trends in social networks. PVLDB 7(4), 229-240 (2013)

8. Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: KDD, pp. 487-492 (2003)

9. Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Procedings of the International Colloquium on Automata, Languages and Programming (ICALP) (2002)

10. Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. In: International Conference on Very Large Data Bases (2008)

11. Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding hierarchical heavy hitters in data streams. In: International Conference on Very Large Data Bases, pp. 464-475 (2003)

12. Cormode, G., Korn, F., Tirthapura, S.: Time decaying aggregates in out-of-order streams. In: ACM Principles of Database Systems (2008) [OpenAIRE]

13. Cormode, G., Muthukrishnan, S.: An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms 55(1), 58-75 (2005) [OpenAIRE]

14. Dallachiesa, M., Nushi, B., Mirylenka, K., Palpanas, T.: Uncertain time-series similarity: Return to the basics. PVLDB 5(11), 1662- 1673 (2012) [OpenAIRE]

15. Dallachiesa, M., Palpanas, T.: Identifying streaming frequent items in ad hoc time windows. Data Knowl. Eng. 87, 66-90 (2013) [OpenAIRE]

39 references, page 1 of 3
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue