AutoML to Date and Beyond: Challenges and Opportunities

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 04 Oct 2021Embargo end date: 01 Jan 2020 English Publisher:Association for Computing Machinery (ACM)Journal:ACM Computing Surveys, volume 54, pages 1-36 (issn: 0360-0300, eissn: 1557-7341,

Copyright policy )

Authors: Shubhra Kanti Karmaker Santu; Md. Mahadi Hassan; Micah J. Smith; Lei Xu 0040; Chengxiang Zhai; Kalyan Veeramachaneni;

doi: 10.1145/3470918 , 10.48550/arxiv.2010.10777

arXiv: 2010.10777

AutoML to Date and Beyond: Challenges and Opportunities

- Summary
- Subjects
- Related research
  (29)
- Metrics

Abstract

As big data becomes ubiquitous across domains, and more and more stakeholders aspire to make the most of their data, demand for machine learning tools has spurred researchers to explore the possibilities of automated machine learning (AutoML). AutoML tools aim to make machine learning accessible for non-machine learning experts (domain experts), to improve the efficiency of machine learning, and to accelerate machine learning research. But although automation and efficiency are among AutoML’s main selling points, the process still requires human involvement at a number of vital steps, including understanding the attributes of domain-specific data, defining prediction problems, creating a suitable training dataset, and selecting a promising machine learning technique. These steps often require a prolonged back-and-forth that makes this process inefficient for domain experts and data scientists alike and keeps so-called AutoML systems from being truly automatic. In this review article, we introduce a new classification system for AutoML systems, using a seven-tiered schematic to distinguish these systems based on their level of autonomy. We begin by describing what an end-to-end machine learning pipeline actually looks like, and which subtasks of the machine learning pipeline have been automated so far. We highlight those subtasks that are still done manually—generally by a data scientist—and explain how this limits domain experts’ access to machine learning. Next, we introduce our novel level-based taxonomy for AutoML systems and define each level according to the scope of automation support provided. Finally, we lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline and discussing important challenges that stand in the way of this ambitious goal.

Related Organizations

University of Illinois at Urbana Champaign
United States
Auburn University
United States
Massachusetts Institute of Technology
United States
University of Illinois at Urbana–Champaign
United States
Auburn University System
United States

Keywords

FOS: Computer and information sciences, I.2, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)

29 Research products, page 1 of 3

test-cube software on GitHub
IsRelatedTo
adatune software on GitHub
IsRelatedTo
feature-engineering-and-feature-selection software on GitHub
IsRelatedTo
drake software on GitHub
IsRelatedTo
hyperband software on GitHub
IsRelatedTo
PocketFlow software on GitHub
IsRelatedTo
Feature-Selection software on GitHub
IsRelatedTo
TransmogrifAI software on GitHub
IsRelatedTo
ExploreKit software on GitHub
IsRelatedTo
optuna software on GitHub
IsRelatedTo

chevron_left
1
2
3
chevron_right

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	200
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 0.1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 0.1%

Found an issue? Give us feedback

200

Top 0.1%

Top 1%

Top 0.1%

Green

bronze

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering