Deeppredspeech: Computational Models Of Predictive Speech Coding Based On Deep Learning

This dataset contains all data, source code, pre-trained computational predictive models and experimental results related to: Hueber T., Tatulli E., Girin L., Schwatz, J-L "How predictive can be predictions in the neurocognitive processing of auditory and audiovisual speech? A deep learning study." (biorXiv preprint https://doi.org/10.1101/471581). Raw data are extracted from the publicly available database NTCD-TIMIT (10.5281/zenodo.260228). Audio recordings are available in the audio_clean/ directory Post-processed lip image sequences are available in the lips_roi/ directory (67x67 pixels, 8bits, obtained by lossless inverse DCT-2D transform from the DCT feature available in the original repository of NTCD-TIMIT) Phonetic segmentation (extracted from NTCD-TIMIT original zenodo repository) is available in the HTK MLF file volunteer_labelfiles.mlf Audio features (MFCC-spectrogram and log-spectrogram) are available in the mfcc_16k/ and fft_16k/ directories. Models (audio-only, video-only and audiovisual, based on deep feed-forward neural networks and/or convolutional neural network, in .h5 format, trained with Keras 2.0 toolkit) and data normalization parameters (in .dat scikit-learn format) are available in models_mfcc/ and models_logspectro/ directories Predicted and target (ground truth) MFCC-spectro (resp. log-spectro) for the test databases (1909 sentences), and for the different values of \(\tau_p\) or \(\tau_f\) are available in pred_testdb_mfccspectro/ (resp. pred_testdb_logspectro/) directory Source code for extracting audio features, training and evaluating the models is available on GitHub https://github.com/thueber/DeepPredSpeech/ All directories have been zipped before upload. Feel free to contact me for more details. Thomas Hueber, Ph. D., CNRS research fellow, GIPSA-lab, Grenoble, France, thomas.hueber@gipsa-lab.fr

Related Organizations

French National Centre for Scientific Research
France
Grenoble Alpes University
France
CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE
France
Grenoble Images Parole Signal Automatique
France
Grenoble INP - UGA
France

Keywords

deep learning, computational model, multimodal, audiovisual, speech, predictive coding

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average