MER Opportunity and Spirit Rovers Pancam Images Labeled Data Set

Introduction The data set is based on 3,004 images collected by the Pancam instruments mounted on the Opportunity and Spirit rovers from NASA's Mars Exploration Rovers (MER) mission. We used rotation, skewing, and shearing augmentation methods to increase the total collection to 70,864 (see Image Augmentation section for more information). Based on the MER Data Catalog User Survey [1], we identified 25 classes of both scientific (e.g. soil trench, float rocks, etc.) and engineering (e.g. rover deck, Pancam calibration target, etc.) interests (see Classes section for more information). The 3,004 images were labeled on Zooniverse platform, and each image is allowed to be assigned with multiple labels. The images are either 512 x 512 or 1024 x 1024 pixels in size (see Image Sampling section for more information). Classes There is a total of 25 classes for this data set. See the list below for class names, counts, and percentages (the percentages are computed as count divided by 3,004). Note that the total counts don't sum up to 3,004 and the percentages don't sum up to 1.0 because each image may be assigned with more than one class. Class name, count, percentage of dataset Rover Deck, 222, 7.39% Pancam Calibration Target, 14, 0.47% Arm Hardware, 4, 0.13% Other Hardware, 116, 3.86% Rover Tracks, 301, 10.02% Soil Trench, 34, 1.13% RAT Brushed Target, 17, 0.57% RAT Hole, 30, 1.00% Rock Outcrop, 1915, 63.75% Float Rocks, 860, 28.63% Clasts, 1676, 55.79% Rocks (misc), 249, 8.29% Bright Soil, 122, 4.06% Dunes/Ripples, 1000, 33.29% Rock (Linear Features), 943, 31.39% Rock (Round Features), 219, 7.29% Soil, 2891, 96.24% Astronomy, 12, 0.40% Spherules, 868, 28.89% Distant Vista, 903, 30.23% Sky, 954, 31.76% Close-up Rock, 23, 0.77% Nearby Surface, 2006, 66.78% Rover Parts, 301, 10.02% Artifacts, 28, 0.93% Image Sampling Images in the MER rover Pancam archive are of sizes ranging from 64x64 to 1024x1024 pixels. The largest size, 1024x1024, was by far the most common size in the archive. For the deep learning dataset, we elected to sample only 1024x1024 and 512x512 images as the higher resolution would be beneficial to feature extraction. In order to ensure that the data set is representative of the total image archive of 4.3 million images, we elected to sample via "site code". Each Pancam image has a corresponding two-digit alphanumeric "site code" which is used to track location throughout its mission. Since each "site code" corresponds to a different general location, sampling a fixed proportion of images taken from each site ensure that the data set contained some images from each location. In this way, we could ensure that a model performing well on this dataset would generalize well to the unlabeled archive data as a whole. We randomly sampled 20% of the images at each site within the subset of Pancam data fitting all other image criteria, applying a floor function to non-whole number sample sizes, resulting in a dataset of 3,004 images. Train/validation/test sets split The 3,004 images were split into train, validation, and test data sets. The split was done so that roughly 60, 15, and 25 percent of the 3,004 images would end up as train, validation, and test data sets respectively, while ensuing that images from a given site are not split between train/validaiton/test data sets. This resulted in 1,806 train images, 456 validation images, and 742 test images. Augmentation To augment the images in train and validation data sets (note that images in the test data set were not augmented), three augmentation methods were chosen that best represent transformations that could be realistically seen in Pancam images. The three augmentations methods are rotation, skew, and shear. The augmentation methods were applied with random magnitude, followed by a random horizontal flipping, to create 30 augmented images for each image. Since each transformation is followed by a square crop in order to keep input shape consistent, we had to constrict the magnitude limits of each augmentation to avoid cropping out important features at the edges of input images. Thus, rotations were limited to 15 degrees in either direction, the 3-dimensional skew was limited to 45 degrees in any direction, and shearing was limited to 10 degrees in either direction. Note that augmentation was done only on training and validation images. Directory Contents images: contains all 70,864 images train-set-v1.1.0.txt: label file for the training data set val-set-v1.1.0.txt: label file for the validation data set test-set-v1.1.0.txt: label file for the testing data set Images with relatively short file names (e.g., 1p128287181mrd0000p2303l2m1.img.jpg) are original images, and images with long file names (e.g., 1p128287181mrd0000p2303l2m1.img.jpg_04140167-5781-49bd-a913-6d4d0a61dab1.jpg) are augmented images. The label files are formatted as "Image name, Class1, Class2, ..., ClassN". Reference [1] S.B. Cole, J.C. Aubele, B.A. Cohen, S.M. Milkovich, and S.A. Shields, Identifying Community Needs for a Mars Exploration Rovers (MER), Daata Catalog, 51st Lunar and Planetary Science Conference (LPSC), 2020.

Related Organizations

National Aeronautics and Space Administration
United States
Duke University
United States
Space Science Institute
United States
Jet Propulsion Lab
United States

Keywords

Machine Learning, Image Classification, Mars, Mars Exploration Rovers

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average