<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Scaling Laws Do Not Scale

Name: Scaling Laws Do Not Scale
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computers and Society, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computers and Society (cs.CY), FOS: Physical sciences, Disordered Systems and Neural Networks (cond-mat.dis-nn), Condensed Matter - Disordered Systems and Neural Networks, Machine Learning (cs.LG)

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 16 Oct 2024Embargo end date: 01 Jan 2023Publisher:Association for the Advancement of Artificial Intelligence (AAAI)Journal:Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 7, pages 341-357 (eissn: 3065-8365,

Authors: Diaz, Fernando; Madaio, Michael;

doi: 10.1609/aies.v7i1.31641 , 10.48550/arxiv.2307.03201

arXiv: http://arxiv.org/abs/2307.03201

Scaling Laws Do Not Scale

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Recent work has advocated for training AI models on ever-larger datasets, arguing that as the size of a dataset increases, the performance of a model trained on that dataset will correspondingly increase (referred to as “scaling laws”). In this paper, we draw on literature from the social sciences and machine learning to critically interrogate these claims. We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output. As the size of datasets used to train large AI models grows and AI systems impact ever larger groups of people, the number of distinct communities represented in training or evaluation datasets grows. It is thus even more likely that communities represented in datasets may have values or preferences not reflected in (or at odds with) the metrics used to evaluate model performance in scaling laws. Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations---threatening the validity of claims that model performance is improving at scale. We end the paper with implications for AI development: that the motivation for scraping ever-larger datasets may be based on fundamentally flawed assumptions about model performance. That is, models may not, in fact, continue to improve as the datasets get larger---at least not for all people or communities impacted by those models. We suggest opportunities for the field to rethink norms and values in AI development, resisting claims for universality of large models, fostering more local, small-scale designs, and other ways to resist the impetus towards scale in AI.

Related Organizations

Google (United States)
United States
Google
United States
Google
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computers and Society, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computers and Society (cs.CY), FOS: Physical sciences, Disordered Systems and Neural Networks (cond-mat.dis-nn), Condensed Matter - Disordered Systems and Neural Networks, Machine Learning (cs.LG)

1 Research products, page 1 of 1

prize software on GitHub
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

gold

Scaling Laws Do Not Scale

Scaling Laws Do Not Scale

1 Research products, page 1 of 1

prize software on GitHub