
This dataset contains the subs2vec embeddings for English, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles. For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of: Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …) Window size: varying context windows (e.g., 2, 5, 10, …) Each file corresponds to a unique configuration (dimension × window size). Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1). If you use this dataset, please cite: Manuscript: https://doi.org/10.5281/zenodo.17243812 Data: This Zenodo dataset (using the DOI provided here) sha256sums: en_300_6_cbow_wxd.csv.bz2 72a94830d81ebbe28e7fa78465e02ad2bd7771ef5414f8f30f6de94565050167 en_300_6_sg_wxd.csv.bz2 b0d7db822f181a124758e55b0c33b47e0a249c37f8a778806f6166a5baf96cb3 en_500_1_cbow_wxd.csv.bz2 4196d84670045dc3cb65195f4045e543c78c1c35d28530b6c24e7711b8cbf23b
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
