<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create MMEarth, a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that pretraining with multi-modal pretext tasks notably improves the linear probing performance compared to pretraining on optical satellite images only. This also leads to better label efficiency and parameter efficiency which are crucial aspects in global scale applications.
Accepted for ECCV 2024. Data and code: https://vishalned.github.io/mmearth Update arXiv v2 (ECCV): 1. Dataset fix: Removed duplicates and corrected ERA5 yearly statistics. 2. Data augmentation fix: Random crops are now aligned. 3. Test metrics fix: Metrics are now overall instead of mini-batch averages, matching GEO-Bench metrics. 4. Pretrained on MMEarth v001 & evaluated on GEO-Bench v1.0
FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |