The miniJPAS survey: star-galaxy classification using machine learning

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 01 Jan 2021Embargo end date: 01 Jan 2020 Spain Publisher:EDP SciencesJournal:Astronomy & Astrophysics, volume 645, page A87 (issn: 0004-6361, eissn: 1432-0746,

Copyright policy )Funded by:EC | BEHOMO, NSF | Collaborative Research: T..., NSF | Collaborative Research: T... +1 projects

Authors: P. O. Baqui; V. Marra; L. Casarini; R. Angulo; L. A. Díaz-García; C. Hernández-Monteagudo; P. A. A. Lopes; +27 Authors

doi: 10.1051/0004-6361/202038986 , 10.48550/arxiv.2007.07622

arXiv: 2007.07622

handle: 10261/239116 , 20.500.12666/517

The miniJPAS survey: star-galaxy classification using machine learning

- Summary
- Subjects
- Metrics

Abstract

Context.Future astrophysical surveys such as J-PAS will produce very large datasets, the so-called “big data”, which will require the deployment of accurate and efficient machine-learning (ML) methods. In this work, we analyze the miniJPAS survey, which observed about ∼1 deg2of the AEGIS field with 56 narrow-band filters and 4ugribroad-band filters. The miniJPAS primary catalog contains approximately 64 000 objects in therdetection band (magAB ≲ 24), with forced-photometry in all other filters.Aims.We discuss the classification of miniJPAS sources into extended (galaxies) and point-like (e.g., stars) objects, which is a step required for the subsequent scientific analyses. We aim at developing an ML classifier that is complementary to traditional tools that are based on explicit modeling. In particular, our goal is to release a value-added catalog with our best classification.Methods.In order to train and test our classifiers, we cross-matched the miniJPAS dataset with SDSS and HSC-SSP data, whose classification is trustworthy within the intervals 15 ≤ r ≤ 20 and 18.5 ≤ r ≤ 23.5, respectively. We trained and tested six different ML algorithms on the two cross-matched catalogs: K-nearest neighbors, decision trees, random forest (RF), artificial neural networks, extremely randomized trees (ERT), and an ensemble classifier. This last is a hybrid algorithm that combines artificial neural networks and RF with the J-PAS stellar and galactic loci classifier. As input for the ML algorithms we used the magnitudes from the 60 filters together with their errors, with and without the morphological parameters. We also used the mean point spread function in therdetection band for each pointing.Results.We find that the RF and ERT algorithms perform best in all scenarios. When the full magnitude range of 15 ≤ r ≤ 23.5 is analyzed, we find an area under the curve AUC = 0.957 with RF when photometric information alone is used, and AUC = 0.986 with ERT when photometric and morphological information is used together. When morphological parameters are used, the full width at half maximum is the most important feature. When photometric information is used alone, we observe that broad bands are not necessarily more important than narrow bands, and errors (the width of the distribution) are as important as the measurements (central value of the distribution). In other words, it is apparently important to fully characterize the measurement.Conclusions.ML algorithms can compete with traditional star and galaxy classifiers; they outperform the latter at fainter magnitudes (r ≳ 21). We use our best classifiers, with and without morphology, in order to produce a value-added catalog.

Country

Spain

Related Organizations

University of Michigan–Flint
United States
National Institute for Astrophysics
Italy
Federal University of Rio de Janeiro
Brazil
University of Alabama - USA
United States
Lancaster University
United Kingdom

View all View all

Keywords

statistics [Stars], Cosmology and Nongalactic Astrophysics (astro-ph.CO), Galaxies: statistics, FOS: Physical sciences, Catalogues, Stars: statistics, statistics [Galaxies], Methods: data analysis, Catalogs, data analysis [Methods], Astrophysics - Instrumentation and Methods for Astrophysics, Instrumentation and Methods for Astrophysics (astro-ph.IM), Astrophysics - Cosmology and Nongalactic Astrophysics

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	45
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%