The dataset of the ASE'20 paper titled "Automated Patch Correctness Assessment: How Far are We?"

This is the experiment result of the ASE'20 paper titled "Automated Patch Correctness Assessment: How Far are We?". If you use our data for academic research, please cite our paper as: @inproceedings{wang2020automated, title={Automated Patch Correctness Assessment: How Far are We?}, author={Wang, Shangwen and Wen, Ming and Lin, Bo and Wu, Hongjun and Qin, Yihao and Zou, Deqing and Mao, Xiaoguang and Jin, Hai}, booktitle={Proceedings of the 35th International Conference on Automated Software Engineering (ASE)}, year={2020}, organization={ACM} } The file Patches.zip includes all the patches we take into consideration in this study. Note that 269 patches come from "Automated Patch Assessment for Program Repair at Scale (Ye et al.), Technical report 1909.13694, arXiv, 2019". The file Patches_for_Static include all the class files we used for static method. The file Tests-oracle includes all the test cases generated by Evosuite and Randoop on the fixed version programs. The file Tests-buggy includes all the test cases generated by Evosuite and Randoop on the buggy version programs. The file DiffTGen-result includes ingredients and output information of DiffTGen. The file Daikon-output includes inferred invariants of each patch and its corresponding ground-truth. The file PATCH-SIM_result includes the output vector files from PATCH-SIM and E-PATCH-SIM. The file Training_result includes the output of six ML algorithms with or without oracle. Chart: 1-26; Closure: 14, 18, 31, 33, 38, 40, 57, 62, 63, 70, 73, 86, 92, 93, 115, 123, 126; Lang: 6, 7, 10, 16, 20, 21, 22, 24, 26, 27, 33, 35, 38, 39, 41, 43, 44, 45, 50, 51, 55, 57, 58, 59, 60, 61, 63; Math: 2, 3, 4, 5, 6, 8, 20, 22, 25, 28, 30, 31, 32, 33, 34, 35, 39, 41, 49, 50, 53, 56, 57, 58, 59, 60, 61, 63, 65, 68, 70, 71, 73, 74, 75, 79, 80, 81, 82, 85, 86, 88, 89, 90, 93, 97, 98, 99, 104; Time: 4, 7, 11, 14, 15, 19. Please note that for bugs in the above table, the Evosuite tests on the fixed version programs are reused from a previous study. We thank He Ye, Matias Martinez, and Martin Monperrus so much for sharing their data. Notice! For patches under the folder Patches_ICSE, those under Ddifferent and Dsame folders are all correct patches. Different and Same only indicate whether the patch is syntactically identical to the ground truth patch. Patches generated for Mockito project (2 in total): Kali-A-Mockito-10; Arja-Mockito-10 Patches do not pass plausibility check (6 in total): Kali-Closure-133; kPAR-Chart-12; FixMiner-Chart-12; patch1-Lang-6-SketchFix-plausible; patch2-Lang-6-SketchFix-plausible; patch1-Math-2-SOFix Patches that are mistakenly labeled (12 in total): patch2-Lang-51-Jaid; patch1-Lang-43-CapGen; patch2-Lang-43-CapGen; patch2-Math-53-CapGen; patch2-Math-53-Jaid; jKali-Lang-7; ACS-Lang-35; Arja-Math-35; SimFix-Math-72; SimFix-Closure-19; Arja-Math-50; SimFix-Lang-60 Detailed reasons for the mislabeled patches: 1. the ground-truth patch modifies multiple locations while the generated patch only modifies one of them (2/12, SimFix-Math-72, SimFix-Lang-60); 2. the edit points in the generated patch are different from those in ground-truth patch (8/12, patch2-Lang-51-Jaid, patch2-Math-53-Jaid, patch1-Lang-43-CapGen, patch2-Lang-43-CapGen, patch2-Math-53-CapGen, ACS-Lang-35, SimFix-Closure-19, Arja-Math-50); 3. the generated patch doesnot fulfill the intended function in ground-truth (2/12, jKali-Lang-7, Arja-Math-35). Take Arja-Math-50 as an example, this patch deletes a conditional statement which deals with an unexpected input (null) in the method verifyBracketing. However, in the oracle program, this conditional statement still exists. Then, Randoop generated a test case by calling verifyBracketing with a null argument. This test passed on the ground-truthpatch but failed on the patch generated by Arja due to the removeof the exception handling statements. As a result, this patch is actually overfitting but mistakenly labeled as correct. We have confirmed this case with Kui Liu, the first author of the recent ICSE'20 paper (Title: On the Efficiency of Test Suite based Program Repair) which makes up our patch benchmark. Border line Patches (3 in total): ACS-Lang-7; kPAR-Lang-7; TBar-Lang-7. Reasons for overfitting: Evosuite generates some tests that fail on those patches, e.g., test049 in Seed 1; the Java documentation above the function states that it needs to deal with the situation where the input cannot be converted. Reasons for correct: it synthesizes the correct modification; currently, in the program, createBigDecimal() is not called directly in other part of the production code except createNumber() and the test code. In our paper, we consider these three patches as correct and that's why Evosuite has 3 false positives.

Related Organizations

National University of Defense Technology
China (People's Republic of)

Keywords

patch correctness assessment, automated program repair

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	58
download	downloads	37

58
views
37
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

58

37