Machine-learning classification of astronomical sources: estimating F1-score in the absence of ground truth

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 16 Oct 2022Embargo end date: 01 Jan 2022 English Publisher:Oxford University Press (OUP)Journal:Monthly Notices of the Royal Astronomical Society: Letters, volume 517, pages L116-L120 (issn: 1745-3925, eissn: 1745-3933,

Copyright policy )Funded by:FCT | Institute of Astrophysics..., FCT | IA, FCT | IdEaS with ALMA +2 projects

Authors: Humphrey, A.; Kuberski, W.; Bialek, J.; Perrakis, N.; Cools, W.; Nuyttens, N.; Elakhrass, H.; +1 Authors

doi: 10.1093/mnrasl/slac120 , 10.48550/arxiv.2209.15112

arXiv: 2209.15112

Machine-learning classification of astronomical sources: estimating F1-score in the absence of ground truth

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

ABSTRACT Machine-learning based classifiers have become indispensable in the field of astrophysics, allowing separation of astronomical sources into various classes, with computational efficiency suitable for application to the enormous data volumes that wide-area surveys now typically produce. In the standard supervised classification paradigm, a model is typically trained and validated using data from relatively small areas of sky, before being used to classify sources in other areas of the sky. However, population shifts between the training examples and the sources to be classified can lead to ‘silent’ degradation in model performance, which can be challenging to identify when the ground-truth is not available. In this letter, we present a novel methodology using the nannyml Confidence-Based Performance Estimation (CBPE) method to predict classifier F1-score in the presence of population shifts, but without ground-truth labels. We apply CBPE to the selection of quasars with decision-tree ensemble models, using broad-band photometry, and show that the F1-scores are predicted remarkably well (${\rm MAPE} \sim 10{{\ \rm per\ cent}}$; R2 = 0.74–0.92). We discuss potential use-cases in the domain of astronomy, including machine-learning model and/or hyperparameter selection, and evaluation of the suitability of training data sets for a particular classification problem.

Related Organizations

Universidade Lusófona do Porto
Portugal
Institute of Astrophysics and Space Sciences
Portugal
DTX Digital Transformation CoLAB
Portugal
Faculdade de ciencias da Universidade do Porto
Portugal
University of Minho
Portugal

View all View all

Keywords

Astrophysics of Galaxies (astro-ph.GA), FOS: Physical sciences, Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Astrophysics of Galaxies, Instrumentation and Methods for Astrophysics (astro-ph.IM)

2 Research products, page 1 of 1

HER2-Prediction software on GitHub
IsRelatedTo
nannyml software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	31
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%