Statistical properties of transactional databases

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 14 Mar 2004 Italy Publisher:ACMJournal:Proceedings of the 2004 ACM symposium on Applied computing

Authors: P. PALMERINI; ORLANDO, Salvatore; R. PEREGO;

doi: 10.1145/967900.968009

handle: 20.500.14243/57569 , 10278/8690

Statistical properties of transactional databases

- Summary
- Subjects
- Metrics

Abstract

Most of the complexity of common data mining tasks is due to the unknown amount of information contained in the data being mined. The more patterns and corelations are contained in such data, the more resources are needed to extract them. This is confirmed by the fact that in general there is not a single best algorithm for a given data mining task on any possible kind of input dataset. Rather, in order to achieve good performances, strategies and optimizations have to be adopted according to the dataset specific characteristics. For example one typical distinction in transactional databases is between sparse and dense datasets. In this paper we consider Frequent Set Counting as a case study for data mining algorithms. We propose a statistical analysis of the properties of transactional datasets that allows for a characterization of the dataset complexity. We show how such characterization can be used in many fields, from performance prediction to optimization.

Country

Italy

Related Organizations

National Research Council
Italy
National Research Council
Sri Lanka
Ca Foscari University of Venice
Italy
Institute of Information Science and Technologies "A. Faedo"
Italy

Keywords

Data mining

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	12
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average