A Systematic Evaluation of Large Language Models of Code

These are datasets for the paper: "A Systematic Evaluation of Large Language Models of Code" https://arxiv.org/pdf/2202.13169.pdf The code is available at: https://github.com/VHellendoorn/Code-LMs The file "unseen_test_sets.tar.gz" contains test sets of ~100 files in each of 12 programming languages. These files are not included in The Pile, and thus models such as GPT-Neo, GPT-J, GPT-NeoX were not trained on them. In the paper, we use these test sets to compare a variety of language models of code including OpenAI's Codex, GPT-J, GPT-Neo, GPT-NeoX-20B, and CodeParrot and our PolyCoder model. The file "index.zip" includes an index of the training set file paths and commit SHAs. The other files, such as "2-7B-150K.tar", are trained model checkpoints, as explained at https://github.com/VHellendoorn/Code-LMs .

https://arxiv.org/abs/2202.13169

Related Organizations

Carnegie Mellon University
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	578
download	downloads	2K

578
views
2K
downloads
Powered by

Found an issue? Give us feedback

visibility

download

1

Average

578

2K