Scaled and Translated Image Recognition (STIR)

Paper: [2211.10288] Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks (arxiv.org) Code: taltstidl/scale-equivariant-cnn: Official code for "Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks" (github.com) While convolutions are known to be invariant to (discrete) translations, scaling continues to be a challenge and most image recognition networks are not invariant to them. To explore these effects, we have created the Scaled and Translated Image Recognition (STIR) dataset. This dataset contains objects of size \(s \in [17,64]\), each randomly placed in a \(64 \times 64\) pixel image. Using the dataset Depending on which data you are planning to use, download one or more of the following files. Data is stored in compressed .npz format and can be loaded as documented here. File Description emoji.npz Emoji vector icons rendered as white icon on black background mnist.npz Classic MNIST handwritten digits rescaled to varying sizes trafficsign.npz Traffic signs from street imagery downscaled to varying sizes aerial.npz Objects in aerial imagery downscaled to varying sizes Each file contains multiple arrays that can be accessed in a dictionary-like fashion. The keys are documented below, where n is the number of classes for a given file and m is the number of instances for each class. Both emoji.npz (36 classes, 1 instance) and mnist.npz (10 classes, 50 instances) are in black & white while trafficsign.npz (16 classes, 25 instances) and aerial.npz (9 classes, 25 instances) are in color. Key Shape Description imgs (3, 48, n, m, 64, 64) black & white, (3, 48, n, 64, 64, 3) color Images grouped into 3 sets (training, validation, testing) and 48 different scales. Values will be in range 0 to 255. lbls (3, 48, n, m) Indices referencing ground truth labels. See lbldata for descriptive names. Values will be in range 0 to n - 1. scls (3, 48, n, m) Known scales as given by bounding box size. Values will be in range 17 to 64. psts (3, 48, n, m, 2) Known position of bounding box. First value is distance to left edge, second value distance to top edge. metadata (6, 2) Metadata on title, description, author, license, version and date. lbldata (n,) Descriptive names for each ground truth labels. For use in Python a dataset class is provided that implements the basic functionality for loading a certain split and scale selection, as illustrated in the code below. It ensures shuffling is done in a consistent manner such that ground truth scales and positions can be retrieved. Metadata and label descriptions can be retrieved via metadata and labeldata, respectively. from data.dataset import STIRDataset dataset = STIRDataset('data/emoji.npz') # Obtain images and labels for training images, labels = dataset.to_torch(split='train', scales=[32, 64], shuffle=True) # Obtain known scales and positions for above scales, positions = dataset.get_latents(split='train', scales=[32, 64], shuffle=True) # Get metadata and label descriptions metadata = dataset.metadata label_descriptions = dataset.labeldata License and Attribution When using this dataset for your own research, please respect the individual licenses of the original data. These are distributed within the data files' metadata. For attribution in papers, we recommend the following citations. D. Gandy, J. Otero, E. Emanuel, F. Botsford, J. Lundien, K. Jackson, M. Wilkerson, R. Madole, J. Raphael, T. Chase, G. Taglialatela, B. Talbot, and T. Chase. Font Awesome. https://fontawesome.com/v5/download, Nov. 2022. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, Nov. 1998. C. Ertler, J. Mislej, T. Ollmann, L. Porzi, G. Neuhold, and Y. Kuang. The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale. In 2020 16th Eur. Conf. Comput. Vision (ECCV), Glasgow, UK, Aug. 2020. G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In 2018 IEEE/CVF Conf. Comput. Vision and Pattern Recognition (CVPR), pages 3974–3983, Salt Lake City, UT, USA, June 2018.

Related Organizations

University of Erlangen-Nuremberg
Germany
Fraunhofer Institute for Integrated Circuits
Germany

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	65
download	downloads	26

65
views
26
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

65

26