Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Publications Open Re...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
DOAJ
Article . 2017
Data sources: DOAJ
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
arXiv.org e-Print Archive
Other literature type . Preprint . 2018
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2017
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Journal of Big Data
Other literature type . Article . 2017 . Peer-reviewed
License: CC BY
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Journal of Big Data
Article
License: CC BY
Data sources: UnpayWall
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2017
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Journal of Big Data
Article . 2017
Data sources: DOAJ-Articles
https://doi.org/10.48550/arxiv...
Article . 2018
License: CC BY
Data sources: Datacite
versions View all 11 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Scaling associative classification for very large datasets

Authors: Venturini, Luca; Baralis, Elena; Garza, Paolo;

Scaling associative classification for very large datasets

Abstract

Abstract Supervised learning algorithms are nowadays successfully scaling up to datasets that are very large in volume, leveraging the potential of in-memory cluster-computing Big Data frameworks. Still, massive datasets with a number of large-domain categorical features are a difficult challenge for any classifier. Most off-the-shelf solutions cannot cope with this problem. In this work we introduce DAC, a Distributed Associative Classifier. DAC exploits ensemble learning to distribute the training of an associative classifier among parallel workers and improve the final quality of the model. Furthermore, it adopts several novel techniques to reach high scalability without sacrificing quality, among which a preventive pruning of classification rules in the extraction phase based on Gini impurity. We ran experiments on Apache Spark, on a real large-scale dataset with more than 4 billion records and 800 million distinct categories. The results showed that DAC improves on a state-of-the-art solution in both prediction quality and execution time. Since the generated model is human-readable, it can not only classify new records, but also allow understanding both the logic behind the prediction and the properties of the model, becoming a useful aid for decision makers.

Country
Italy
Related Organizations
Subjects by Vocabulary

Library of Congress Subject Headings: lcsh:Computer engineering. Computer hardware lcsh:TK7885-7895 lcsh:QA75.5-76.95 lcsh:T58.5-58.64 lcsh:Information technology lcsh:Electronic computers. Computer science

Microsoft Academic Graph classification: Computer science Machine learning computer.software_genre Scaling Categorical variable Associative property business.industry Ensemble learning Artificial intelligence business computer Classifier (UML)

Keywords

FOS: Computer and information sciences, Big Data, Computer Science - Machine Learning, Computer engineering. Computer hardware, Information Systems and Management, Computer Science - Artificial Intelligence, Computer Networks and Communications, Associative classification, Machine Learning (stat.ML), Information technology, Machine Learning (cs.LG), TK7885-7895, Statistics - Machine Learning, Machine learning, Apache Spark; Associative classification; Big Data; Machine learning, Apache Spark, QA75.5-76.95, T58.5-58.64, Computer Science - Learning, Artificial Intelligence (cs.AI), Computer Science - Distributed, Parallel, and Cluster Computing, Hardware and Architecture, Electronic computers. Computer science, Distributed, Parallel, and Cluster Computing (cs.DC), Information Systems

30 references, page 1 of 3

1. El Houby EM, Hassan MS. Comparison between Associative Classification and Decision Tree for HCV Treatment Response Prediction. World Academy of Science, Engineering and Technology, International Journal of Medical, Health, Biomedical, Bioengineering and Pharmaceutical Engineering. 2013;7(11):714-718.

2. Apiletti D, Baralis E, Cerquitelli T, Garza P, Pulvirenti F, Venturini L. Frequent Itemsets Mining for Big Data: A Comparative Analysis. Big Data Research. 2017;.

3. Thabtah F. A review of associative classification mining. The Knowledge Engineering Review. 2007;22(01):37-65. [OpenAIRE]

4. Bechini A, Marcelloni F, Segatori A. A MapReduce solution for associative classification of big data. Information Sciences. 2016;332:33-55. [OpenAIRE]

5. Venturini L, Garza P, Apiletti D. BAC: A bagged associative classifier for big data frameworks. In: East European Conference on Advances in Databases and Information Systems. Springer; 2016. p. 137-146. [OpenAIRE]

6. Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press; 1998. p. 80-86.

7. Breiman L. Some properties of splitting criteria. Machine Learning. 1996;24(1):41-47.

8. Li W, Han J, Pei J. CMAR: Accurate and efficient classification based on multiple class-association rules. In: Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on. IEEE; 2001. p. 369-376.

9. Baralis E, Chiusano S, Garza P. A lazy approach to associative classification. IEEE Transactions on Knowledge and Data Engineering. 2008;20(2):156-171. [OpenAIRE]

10. Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, et al. MLlib: Machine Learning in Apache Spark. J Mach Learn Res. 2016 Jan;17(1):1235-1241.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    10
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 126
    download downloads 70
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    10
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
    Powered byBIP!BIP!
  • 126
    views
    70
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
10
Top 10%
Average
Top 10%
126
70
Green
gold