Metadata Ahoy! Charting a reusable path for machine learning

Machine learning (ML) is more popular than ever, but what is needed to best document, curate, and archive ML research outputs? Data curators are largely in uncharted waters as to what extent repositories are able to manage ML objects and components (data, code, parameters, documentation, etc.) in a way that matches researcher needs and uses. But before we can plot a course towards a set of best practices, we must first ask: where are we now?This presentation will provide an overview of a recent research project that assessed how well metadata schema and fields in eight generalist (Figshare, Zenodo, Harvard Dataverse, etc.) and specialist repositories facilitate findability, interoperability, and reusability of ML objects. We will discuss strengths of and opportunities for these repositories, and what generalist repositories can learn from specialist repositories and vice versa. The presentation will also summarize the outputs from this project, all of which are publicly available: a multi-repository metadata field crosswalk, complete metadata exports of nearly 20,000 ML-related items from these repositories, and user interface and code to query repository APIs and standardize and analyze metadata exports. We hope the IASSIST community will dive deep into this bounty of (meta)data!

Related Organizations

University of California, San Diego
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Fields of Science

social sciences

other social sciences

Fields of Science

social sciences

other social sciences