Is Pre-training Truly Better Than Meta-Learning?

Name: Is Pre-training Truly Better Than Meta-Learning?
Keywords: Machine Learning, FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Vision and Pattern Recognition, Computation and Language, Computation and Language (cs.CL), Machine Learning (cs.LG)

Brando Miranda; Patrick Yu; Saumya Goyal; Yu-Xiong Wang; Sanmi Koyejo

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2023

Data sources: arXiv.org e-Print Archive

https://dx.doi.org/10.48550/ar...

Article . 2023

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Article

Data sources: DBLP

Is Pre-training Truly Better Than Meta-Learning?

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2023Embargo end date: 01 Jan 2023Publisher:arXivJournal:CoRR, volume abs/2306.13841

Authors: Brando Miranda; Patrick Yu; Saumya Goyal; Yu-Xiong Wang; Sanmi Koyejo;

doi: 10.48550/arxiv.2306.13841

arXiv: 2306.13841

Is Pre-training Truly Better Than Meta-Learning?

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

In the context of few-shot learning, it is currently believed that a fixed pre-trained (PT) model, along with fine-tuning the final layer during evaluation, outperforms standard meta-learning algorithms. We re-evaluate these claims under an in-depth empirical examination of an extensive set of formally diverse datasets and compare PT to Model Agnostic Meta-Learning (MAML). Unlike previous work, we emphasize a fair comparison by using: the same architecture, the same optimizer, and all models trained to convergence. Crucially, we use a more rigorous statistical tool -- the effect size (Cohen's d) -- to determine the practical significance of the difference between a model trained with PT vs. a MAML. We then use a previously proposed metric -- the diversity coefficient -- to compute the average formal diversity of a dataset. Using this analysis, we demonstrate the following: 1. when the formal diversity of a data set is low, PT beats MAML on average and 2. when the formal diversity is high, MAML beats PT on average. The caveat is that the magnitude of the average difference between a PT vs. MAML using the effect size is low (according to classical statistical thresholds) -- less than 0.2. Nevertheless, this observation is contrary to the currently held belief that a pre-trained model is always better than a meta-learning model. Our extensive experiments consider 21 few-shot learning benchmarks, including the large-scale few-shot learning dataset Meta-Data set. We also show no significant difference between a MAML model vs. a PT model with GPT-2 on Openwebtext. We, therefore, conclude that a pre-trained model does not always beat a meta-learned model and that the formal diversity of a dataset is a driving factor.

Related Organizations

Department of Computer Science
Spain
University of Illinois at Urbana Champaign
United States
University of Illinois at Urbana-Champaign
UNIVERSITY OF ILLINOIS URBANA-CHAMPAIGN
University of Illinois at Urbana–Champaign
United States

View all View all

Keywords

Machine Learning, FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Vision and Pattern Recognition, Computation and Language, Computation and Language (cs.CL), Machine Learning (cs.LG)

1 Research products, page 1 of 1

pytorch-meta software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

Knowmad Institut

Is Pre-training Truly Better Than Meta-Learning?

Is Pre-training Truly Better Than Meta-Learning?

1 Research products, page 1 of 1

pytorch-meta software on GitHub