Spanish 3B Words Word2Vec Embeddings

Ready to use gensim Word2Vec embedding models for the Spanish language. Models are created using a window of +/- 5 words, discarding those words with less than 5 instances and creating a vector of 400 dimensions for each word. The text used to create the embeddings has been recovered from news, Wikipedia, the Spanish BOE, web crawling and open literary sources. The used text has a total of 3.257.329.900 words and 18.852.481.207 characters. We support two types of models: Gensim full models (complete_model.zip) and KeyedVectors (keyed_vectors.zip). You can check the differences between them in the following URL: https://radimrehurek.com/gensim/models/keyedvectors.html To load the full model use: model = Word2Vec.load("complete.model") To load the KeyedVectors use: word_vectors = KeyedVectors.load('complete.kv', mmap='r') More info about the models can be found in: https://github.com/aitoralmeida/spanish_word2vec

Keywords

word embeddings, word2vec, natural language processing, nlp, spanish, gensim

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average