research data . Dataset . 2021

Lemmatized English Word2Vec data

Christian Chiarcos; Tomas Mikolov et al.;
Open Access English
  • Published: 01 Jan 2021
  • Publisher: Zenodo
Abstract
<p># Lemmatized English Word2Vec data</p> <p>This is a version of the original GoogleNews-vectors-negative300 Word2Vec embeddings for English.<br> In addition, we provide the following modified files:</p> <p>- converted to conventional CSV format (and gzipped)<br> - subclassified:<br> &nbsp; for the most frequent 1.000.000 words:<br> &nbsp;&nbsp; &nbsp;subclassified according to WordNet parts of speech: ADJ, ADV, NOUN, VERB, OTHER<br> &nbsp;&nbsp; &nbsp;note that one embedding can be associated with multiple parts of speech<br> &nbsp; for the remaining words:<br> &nbsp;&nbsp;&nbsp; RARE: top 1.000.001 - 2.000.000 words<br> &nbsp;&nbsp; &nbsp;VERY_RARE: top 2.000...
Persistent Identifiers
Subjects
free text keywords: word embeddings, word2vec, English
Funded by
EC| Pret-a-LLOD
Project
Pret-a-LLOD
Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors
  • Funder: European Commission (EC)
  • Project Code: 825182
  • Funding stream: H2020 | RIA
Download from
Zenodo
Dataset . 2021
Provider: Zenodo
Any information missing or wrong?Report an Issue