Neural Language Models for Nineteenth-Century English (dataset; language model zoo)

Research datakeyboard_double_arrow_right Dataset 21 May 2021 Netherlands English Publisher:ZenodoFunded by:UKRI | Living with Machines, UKRI | The Alan Turing Institute

Authors: Hosseini, Kasra; Beelen, Kaspar; Colavizza, Giovanni; Coll Ardanuy, Mariona;

doi: 10.5281/zenodo.4782245 , 10.5281/zenodo.4779090 , 10.5281/zenodo.4779091

Neural Language Models for Nineteenth-Century English (dataset; language model zoo)

- Summary
- Subjects
- Metrics

Abstract

This dataset contains four types of neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5.1 billion tokens. The language model architectures include static (word2vec and fastText) and contextualized models (BERT and Flair). For each architecture, we trained a model instance using the whole dataset. Additionally, we trained separate instances on text published before 1850 for the two static models, and four instances considering different time slices for BERT. Github repository: https://github.com/Living-with-machines/histLM

Country

Netherlands

Related Organizations

University of Amsterdam
Netherlands
The Alan Turing Institute
United Kingdom

Keywords

Flair, Deep Learning, Transformers, Natural language processing, word2vec, Neural Network, LSTM, NLP, Neural networks, Language model, BERT, fastText

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average