Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Scaled and Translated Image Recognition (STIR)

Authors: Altstidl, Thomas; Nguyen, An; Schwinn, Leo; Köferl, Franz; Mutschler, Christopher; Eskofier, Björn; Zanca, Dario;

Scaled and Translated Image Recognition (STIR)

Abstract

Paper: [2211.10288] Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks (arxiv.org) Code: taltstidl/scale-equivariant-cnn: Official code for "Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks" (github.com) While convolutions are known to be invariant to (discrete) translations, scaling continues to be a challenge and most image recognition networks are not invariant to them. To explore these effects, we have created the Scaled and Translated Image Recognition (STIR) dataset. This dataset contains objects of size \(s \in [17,64]\), each randomly placed in a \(64 \times 64\) pixel image. Using the dataset Depending on which data you are planning to use, download one or more of the following files. Data is stored in compressed .npz format and can be loaded as documented here. File Description emoji.npz Emoji vector icons rendered as white icon on black background mnist.npz Classic MNIST handwritten digits rescaled to varying sizes trafficsign.npz Traffic signs from street imagery downscaled to varying sizes aerial.npz Objects in aerial imagery downscaled to varying sizes Each file contains multiple arrays that can be accessed in a dictionary-like fashion. The keys are documented below, where n is the number of classes for a given file and m is the number of instances for each class. Both emoji.npz (36 classes, 1 instance) and mnist.npz (10 classes, 50 instances) are in black & white while trafficsign.npz (16 classes, 25 instances) and aerial.npz (9 classes, 25 instances) are in color. Key Shape Description imgs (3, 48, n, m, 64, 64) black & white, (3, 48, n, 64, 64, 3) color Images grouped into 3 sets (training, validation, testing) and 48 different scales. Values will be in range 0 to 255. lbls (3, 48, n, m) Indices referencing ground truth labels. See lbldata for descriptive names. Values will be in range 0 to n - 1. scls (3, 48, n, m) Known scales as given by bounding box size. Values will be in range 17 to 64. psts (3, 48, n, m, 2) Known position of bounding box. First value is distance to left edge, second value distance to top edge. metadata (6, 2) Metadata on title, description, author, license, version and date. lbldata (n,) Descriptive names for each ground truth labels. For use in Python a dataset class is provided that implements the basic functionality for loading a certain split and scale selection, as illustrated in the code below. It ensures shuffling is done in a consistent manner such that ground truth scales and positions can be retrieved. Metadata and label descriptions can be retrieved via metadata and labeldata, respectively. from data.dataset import STIRDataset dataset = STIRDataset('data/emoji.npz') # Obtain images and labels for training images, labels = dataset.to_torch(split='train', scales=[32, 64], shuffle=True) # Obtain known scales and positions for above scales, positions = dataset.get_latents(split='train', scales=[32, 64], shuffle=True) # Get metadata and label descriptions metadata = dataset.metadata label_descriptions = dataset.labeldata License and Attribution When using this dataset for your own research, please respect the individual licenses of the original data. These are distributed within the data files' metadata. For attribution in papers, we recommend the following citations. D. Gandy, J. Otero, E. Emanuel, F. Botsford, J. Lundien, K. Jackson, M. Wilkerson, R. Madole, J. Raphael, T. Chase, G. Taglialatela, B. Talbot, and T. Chase. Font Awesome. https://fontawesome.com/v5/download, Nov. 2022. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, Nov. 1998. C. Ertler, J. Mislej, T. Ollmann, L. Porzi, G. Neuhold, and Y. Kuang. The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale. In 2020 16th Eur. Conf. Comput. Vision (ECCV), Glasgow, UK, Aug. 2020. G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In 2018 IEEE/CVF Conf. Comput. Vision and Pattern Recognition (CVPR), pages 3974–3983, Salt Lake City, UT, USA, June 2018.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 65
    download downloads 26
  • 65
    views
    26
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
65
26