research data . Dataset . 2021

Lemmatized English Word2Vec data

Chiarcos, Christian; Al., Tomas Mikolov Et;
Open Access English
  • Published: 06 Jan 2021
  • Publisher: Zenodo
Abstract
# Lemmatized English Word2Vec data This is a version of the original GoogleNews-vectors-negative300 Word2Vec embeddings for English.<br> In addition, we provide the following modified files: - converted to conventional CSV format (and gzipped)<br> - subclassified:<br> for the most frequent 1.000.000 words:<br> subclassified according to WordNet parts of speech: ADJ, ADV, NOUN, VERB, OTHER<br> note that one embedding can be associated with multiple parts of speech<br> for the remaining words:<br> RARE: top 1.000.001 - 2.000.000 words<br> VERY_RARE: top 2.000.001 - 3.000.000 words<br> - WordNet lemmatization (via NLTK) in separate files<br> (first lemma only) Note...
Subjects
free text keywords: word embeddings, word2vec, English
Funded by
EC| Pret-a-LLOD
Project
Pret-a-LLOD
Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors
  • Funder: European Commission (EC)
  • Project Code: 825182
  • Funding stream: H2020 | RIA
Download fromView all 3 versions
Zenodo
Dataset . 2021
Provider: Datacite
Zenodo
Dataset . 2021
Provider: Datacite
Zenodo
Dataset . 2021
Provider: Zenodo
Any information missing or wrong?Report an Issue