Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2018
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2018
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2018
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Synthetically Spoken STAIR

Authors: Havard, William N.; Chevrot, Jean-Pierre; Besacier, Laurent;

Synthetically Spoken STAIR

Abstract

This dataset consists of synthetically spoken captions for the STAIR dataset. Following the same methodology as Chrupała et al. (see article | dataset | code) we generated speech for each caption of the STAIR dataset using Google's Text-to-Speech API. This dataset was used for visually grounded speech experiments (see article accepted at ICASSP2019). @INPROCEEDINGS{8683069, author={W. N. {Havard} and J. {Chevrot} and L. {Besacier}}, booktitle={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese}, year={2019}, volume={}, number={}, pages={8618-8622}, keywords={information retrieval;natural language processing;neural nets;speech processing;word processing;artificial neural attention;human attention;monolingual models;part-of-speech tags;nouns;neural models;visually grounded speech signal;English language;Japanese language;word endings;cross-lingual speech-to-speech retrieval;grounded language learning;attention mechanism;cross-lingual speech retrieval;recurrent neural networks.}, doi={10.1109/ICASSP.2019.8683069}, ISSN={2379-190X}, month={May},} The dataset comprises the following files : mp3-stair.tar.gz : MP3 files of each caption in the STAIR dataset. Filenames have the following pattern imageID_captionID, where both imageID and captionID correspond to those provided in the original dataset (see annotation format here) dataset.mfcc.npy : Numpy array with MFCC vectors for each caption. MFCC were extracted using python_speech_features with default configuration. To know to which caption the MFCC vectors belong to, you can use the files dataset.words.txt and dataset.ids.txt. dataset.words.txt : Captions corresponding to each MFCC vector (line number = position in Numpy array, starting from 0) dataset.ids.txt : IDs of the captions (imageID_captionID) corresponding to each MFCC vector (line number = position in Numpy array, starting from 0) Splits test test.txt : captions comprising the test split test_ids.txt: IDs of the captions in the test split test_tagged.txt : tagged version of the test split test-alignments.json.zip : Forced alignments of all the captions in the test split. (dictionary where the key corresponds to the caption ID in the STAIR dataset). Due to an unknown error during upload, the JSON file had to be zipped... train train.txt : captions comprising the train split train_ids.txt : IDs of the captions in the train split train_tagged.txt : tagged version of the train split val val.txt : captions comprising the val split val_ids.txt : IDs of the captions in the val split val_tagged.txt : tagged version of the val split

Related Organizations
Keywords

speech, stair, visually grounded speech, mscoco

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 9
    download downloads 237
  • 9
    views
    237
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
9
237