
Dataset of voxlingua107-xls-r-300m-wav2vec (Alumäe & Kukk, 2022) language identification model embeddings extracted from utterances from the Common Voice 16.1 (Ardila et al., 2020) dataset. Used in "Neighbors and relatives: How do speech embeddings reflect linguistic connections across the world?". Preprint available at https://arxiv.org/abs/2506.08564 Code available at https://github.com/TuukkaOT/speech_embedding_analyzer
linguistic geography, linguistic diversity, speech embeddings, linguistic phylogeny, speech processing
linguistic geography, linguistic diversity, speech embeddings, linguistic phylogeny, speech processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
