Name: Knowledge discovery in data warehouses
Creator: Palpanas, Themistoklis
Keywords: 13. Climate action, 9. Industry and infrastructure, 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology, 14. Life underwater, 7. Clean energy

descriptionPublicationkeyboard_double_arrow_right Article 01 Sep 2000 English Publisher:Association for Computing Machinery (ACM)Journal:ACM SIGMOD Record, volume 29, pages 88-100 (issn: 0163-5808,

Authors: Palpanas, Themistoklis;

doi: 10.1145/362084.362142

handle: 11572/74618

Knowledge discovery in data warehouses

- Summary
- Metrics

Abstract

As the size of data warehouses increase to several hundreds of gigabytes or terabytes, the need for methods and tools that will automate the process of knowledge extraction, or guide the user to subsets of the dataset that are of particular interest, is becoming prominent. In this survey paper we explore the problem of identifying and extracting interesting knowledge from large collections of data residing in data warehouses, by using data mining techniques. Such techniques have the ability to identify patterns and build succinct models to describe the data. These models can also be used to achieve summarization and approximation. We review the associated work in the OLAP, data mining, and approximate query answering literature. We discuss the need for the traditional data mining techniques to adapt, and accommodate the specific characteristics of OLAP systems. We also examine the notion of interestingness of data, as a tool to guide the analysis process. We describe methods that have been proposed in the literature for determining what is interesting to the user and what is not, and how these approaches can be incorporated in the data mining algorithms.

Related Organizations

University of Toronto
Canada
University of Trento
Italy

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	17
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average