Understanding transfer learning and gradient-based meta-learning techniques

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 12 Dec 2023Embargo end date: 01 Jan 2023 English Publisher:Springer Science and Business Media LLCJournal:Machine Learning, volume 113, pages 4,113-4,132 (issn: 0885-6125, eissn: 1573-0565,

Copyright policy )

Authors: Mike Huisman; Aske Plaat; Jan N. van Rijn;

doi: 10.1007/s10994-023-06387-w , 10.48550/arxiv.2310.06148

arXiv: 2310.06148

Understanding transfer learning and gradient-based meta-learning techniques

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

AbstractDeep neural networks can yield good performance on various tasks but often require large amounts of data to train them. Meta-learning received considerable attention as one approach to improve the generalization of these networks from a limited amount of data. Whilst meta-learning techniques have been observed to be successful at this in various scenarios, recent results suggest that when evaluated on tasks from a different data distribution than the one used for training, a baseline that simply finetunes a pre-trained network may be more effective than more complicated meta-learning techniques such as MAML, which is one of the most popular meta-learning techniques. This is surprising as the learning behaviour of MAML mimics that of finetuning: both rely on re-using learned features. We investigate the observed performance differences between finetuning, MAML, and another meta-learning technique called Reptile, and show that MAML and Reptile specialize for fast adaptation in low-data regimes of similar data distribution as the one used for training. Our findings show that both the output layer and the noisy training conditions induced by data scarcity play important roles in facilitating this specialization for MAML. Lastly, we show that the pre-trained features as obtained by the finetuning baseline are more diverse and discriminative than those learned by MAML and Reptile. Due to this lack of diversity and distribution specialization, MAML and Reptile may fail to generalize to out-of-distribution tasks whereas finetuning can fall back on the diversity of the learned features.

Related Organizations

Leiden University
Netherlands
UNIVERSITEIT LEIDEN
Netherlands
Leiden University, Leiden Institute of Advanced Computer Science
Netherlands

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)

3 Research products, page 1 of 1

transfer-meta-feature-representations software on GitHub
IsRelatedTo
revisiting-learned-optimizers software on GitHub
IsRelatedTo
pytorch-meta software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%