Downloads provided by UsageCounts
This dataset contains speech from Finnish parliament 2008-2020 plenary sessions, segmented and aligned for speech recognition training. In total, the training set has: 1.4 million samples 3100 hours of audio 460 speakers over 19 million word tokens Additionally, the upload contains 5h long development and 5h long evaluation sets described in publication 10.21437/Interspeech.2017-1115. Due to the size of the training set (~300 GB) and Zenodo upload limit (50 GB), only the development and evaluation sets are published on Zenodo. Rest of the data is available at: http://urn.fi/urn:nbn:fi:lb-2021051903 The training set comes in two parts: 2008-2016 set which is originally described in publication 10.21437/Interspeech.2017-1115. This set includes a list of samples from sessions in 2008-2014 that can be combined with the 2015-2020 set to form the 3100 hour training set. A new 2015-2020 dataset. All audio samples are single-channel, 16 kHz and 16-bit wav files. Each wav file has corresponding transcript in a .trn text file. The data is machine-extracted so there still remains small inaccuracies in the training set transcripts and possibly few Swedish samples. Development and evaluation sets have been corrected by hand. The licenses can be viewed at: http://urn.fi/urn:nbn:fi:lb-2019112822 (audio) http://urn.fi/urn:nbn:fi:lb-2019112823 (text) The code used in extraction is available at: https://github.com/aalto-speech/finnish-parliament-scripts (2008-2014, dev and eval sets) https://github.com/aalto-speech/fi-parliament-tools (2015-2020 set)
speech recognition, speaker diarization, Finnish parliament
speech recognition, speaker diarization, Finnish parliament
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 32 | |
| downloads | 3 |

Views provided by UsageCounts
Downloads provided by UsageCounts