Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization)

1. There are 6 data folders and 6 results folders containing the data and result for six languages from CodeXGLUE (Java, Python, Ruby, Js, Go, PHP). For example: Java dataset can be found in Java_data folder and results can be found in Java_result folder. Result folders contain the result generated by different models. 2. For the same project code summarization, we have 3 folders for each project (one for data and two for results). For example: for a wildfly project wildfly_data contains the dataset and wildfly_result contains the results of the same project code summarization. wildflyv2_result presents the result for cross-project setup. To do this experiment, please replace the training data in the wifi_data folder with complete java training data provided in item 1. 3. In script folder we have two scripts for davinci.py for code-davnci-002 model and turbo.py is gpt-3.5 turbo model. Note that all the program analysis information are already available on the data folder. Just running the following command will generate the expected summary. Davinci: python davinci.py --open_key --data_folder Java_data --model davinci --mode BM25 --use_repo no --use_id no --use_dfg no --pause_duration 6 --language Java Possible options: use_repo : yes / no use_dfg : yes / no use_id : id3 / no Turbo: python turbo.py --open_key --data_folder Java_data --model turbo --mode BM25 --use_repo no --use_id no --use_dfg no --pause_duration 2 --language Java Possible options: use_repo : yes / no use_dfg : yes / no use_id : id3 / no 4. The repo information is already available in “train.jsonl” files which can be found in every data file. 5. DFG can be extracted by running DFG.py script (in the script folder). python DFG.py --data_folder --language 6. ID extraction scripts are also provided in the script folders (i.e., java_id.py, python_id.py). 7. To setup the parser, run "bash setup.sh" given in the script folder. Acknowledgement: We thank the authors of CodeSearchNet and CodeXGLUE for the dataset. We also acknowledge the authors of GraphCodeBERT paper for their scripts for extracting DFG of a function.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average