Quantity doesn't buy quality syntax with neural language models

This repository contains the 125 LSTM models analyzed in van Schijndel, Mueller, and Linzen (2019) "Quantity doesn't buy quality syntax with neural language models". Each archive contains 25 models trained on a specific number of training tokens. All models were trained to use the vocabulary in vocab.txt. The naming convention for each model is: LSTM_[Hidden Units]_[Training Tokens]_[Training Partition]_[Random Seed]-d[Dropout Rate].pt Hidden Units: The number of hidden units per layer (there are two layers in each model) {100, 200, 400, 800, 1600} Training Tokens: The number of tokens used to train each model {2m, 10m, 20m, 40m, 80m} Training Partition: Five distinct training partitions were created for each amount of training data {a, b, c, d, e} Random Seed: The random seed used to train each model* Dropout Rate: All models used a dropout rate of 0.2 *A scripting bug led to a random seed of 0 for all models trained on less than 40 million tokens. This does not substantively affect the analyses since each model is distinct in terms of the model configuration or training data, so we opted to not retrain the models with unique random seeds to save time and computational resources.

{"references": ["van Schijndel, Mueller, and Linzen (2019) https://www.aclweb.org/anthology/D19-1592/"]}

Related Organizations

Cornell University
United States
Johns Hopkins University
United States

1 Research products, page 1 of 1

Quantity doesn’t buy quality syntax with neural language models
2019IsDocumentedBy

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%