
This dataset contains the subs2vec embeddings for German, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles. For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of: Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …) Window size: varying context windows (e.g., 2, 5, 10, …) Each file corresponds to a unique configuration (dimension × window size). Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1). If you use this dataset, please cite: Manuscript: https://doi.org/10.5281/zenodo.17243812 Data: This Zenodo dataset (using the DOI provided here) Some files were split into parts due to limitations with Zenodo: If you’ve downloaded files named like file.bz2.part_000, file.bz2.part_001, etc., you’ll need to recombine them before use. Please download and use the README file, which explains how to download, recombine, and verify the split files (for Linux, Mac, and Windows). Windows users will need to download our helper file FileChunker.ps1. sha256sum hash values for verification: 4047c670d6556c3f7548bbb27ee63a2243588d6bd440a22e3a3140682c8f0a09 de_500_3_sg_wxd.csv.bz2
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
