
We report a novel machine learning algorithm for automatically detecting and classifying aurora in all-sky images (ASI) that is largely trained without requiring ground-truth labels. By including a small number of labeled images, we are able to automatically label all of the approximately 700 million images in the Time History of Events and Macroscale Interactions during Substorms (THEMIS) ASI dataset from 2008 to 2022. We use a two-stage approach. In the first stage, we adapt the Simple framework for Contrastive Learning of Representations (SimCLR) algorithm to learn latent representations of THEMIS all-sky images. We then finetune a classifier network on the latent representations our model learns of the manually labeled Oslo aurora THEMIS (OATH) dataset. We demonstrate that this two-stage approach achieves excellent classification results on data for which there is no current ML classification benchmark. The outcome of this work will facilitate efficient information retrieval for researchers interested in specific categories of aurora and will enable large scale statistical studies and machine learning analyses of THEMIS all-sky images that have not previously been possible. To demonstrate possible ways to utilize this database, we performed a statistical analysis of the occurrence rates of auroral labels with respect to solar wind parameters, interplanetary magnetic field vector, and geomagnetic indices. We further investigate the occurrence rates of auroral phenomena in the annotated data set and their geoeffectiveness by utilizing the co-located THEMIS ground magnetometer data set. This repository holds the scripts necessary to reproduce the experiments in this paper.
heliophysics, Machine learning, aurora
heliophysics, Machine learning, aurora
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
