Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

German word2vec embeddings trained on OpenSubtitles Part 3

Authors: Grim, Philip; Buchanan, Erin;

German word2vec embeddings trained on OpenSubtitles Part 3

Abstract

This dataset contains the subs2vec embeddings for German, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles. For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of: Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …) Window size: varying context windows (e.g., 2, 5, 10, …) Each file corresponds to a unique configuration (dimension × window size). Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1). If you use this dataset, please cite: Manuscript: https://doi.org/10.5281/zenodo.17243812 Data: This Zenodo dataset (using the DOI provided here) Some files were split into parts due to limitations with Zenodo: If you’ve downloaded files named like file.bz2.part_000, file.bz2.part_001, etc., you’ll need to recombine them before use. Please download and use the README file, which explains how to download, recombine, and verify the split files (for Linux, Mac, and Windows). Windows users will need to download our helper file FileChunker.ps1. sha256sum hash values for verification: 4047c670d6556c3f7548bbb27ee63a2243588d6bd440a22e3a3140682c8f0a09 de_500_3_sg_wxd.csv.bz2

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average