• shareshare
  • link
  • cite
  • add
auto_awesome_motion View all 3 versions
Research data . Dataset . 2021

A Code Token Type Taxonomy-enhanced dataset with pre-computed token types for Python150k

Le, Kim Tuyen; Rashidi, Gabriel; Andrzejak, Artur;
Open Access

Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion approaches. We published the CT3-enhanced dataset with pre-computed token types for each token in the Python150k dataset. The dataset was obtained from an empirical study of the below paper: Kim Tuyen Le, Gabriel Rashidi, and Artur Andrzejak. A Methodology for Refined Evaluation of ML-based Code Completion Approaches. In Special Issue on Programming Language Processing, Data Mining and Knowledge Discovery. Please read the README.txt file for detailed information of structuring the enhanced dataset.


code completion, accuracy evaluation, code token types