Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2020
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2020
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2020
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

The dataset of the ASE'20 paper titled "Automated Patch Correctness Assessment: How Far are We?"

Authors: Wang, Shangwen; Lin, Bo;

The dataset of the ASE'20 paper titled "Automated Patch Correctness Assessment: How Far are We?"

Abstract

This is the experiment result of the ASE'20 paper titled "Automated Patch Correctness Assessment: How Far are We?". If you use our data for academic research, please cite our paper as: @inproceedings{wang2020automated, title={Automated Patch Correctness Assessment: How Far are We?}, author={Wang, Shangwen and Wen, Ming and Lin, Bo and Wu, Hongjun and Qin, Yihao and Zou, Deqing and Mao, Xiaoguang and Jin, Hai}, booktitle={Proceedings of the 35th International Conference on Automated Software Engineering (ASE)}, year={2020}, organization={ACM} } The file Patches.zip includes all the patches we take into consideration in this study. Note that 269 patches come from "Automated Patch Assessment for Program Repair at Scale (Ye et al.), Technical report 1909.13694, arXiv, 2019". The file Patches_for_Static include all the class files we used for static method. The file Tests-oracle includes all the test cases generated by Evosuite and Randoop on the fixed version programs. The file Tests-buggy includes all the test cases generated by Evosuite and Randoop on the buggy version programs. The file DiffTGen-result includes ingredients and output information of DiffTGen. The file Daikon-output includes inferred invariants of each patch and its corresponding ground-truth. The file PATCH-SIM_result includes the output vector files from PATCH-SIM and E-PATCH-SIM. The file Training_result includes the output of six ML algorithms with or without oracle. Chart: 1-26; Closure: 14, 18, 31, 33, 38, 40, 57, 62, 63, 70, 73, 86, 92, 93, 115, 123, 126; Lang: 6, 7, 10, 16, 20, 21, 22, 24, 26, 27, 33, 35, 38, 39, 41, 43, 44, 45, 50, 51, 55, 57, 58, 59, 60, 61, 63; Math: 2, 3, 4, 5, 6, 8, 20, 22, 25, 28, 30, 31, 32, 33, 34, 35, 39, 41, 49, 50, 53, 56, 57, 58, 59, 60, 61, 63, 65, 68, 70, 71, 73, 74, 75, 79, 80, 81, 82, 85, 86, 88, 89, 90, 93, 97, 98, 99, 104; Time: 4, 7, 11, 14, 15, 19. Please note that for bugs in the above table, the Evosuite tests on the fixed version programs are reused from a previous study. We thank He Ye, Matias Martinez, and Martin Monperrus so much for sharing their data. Notice! For patches under the folder Patches_ICSE, those under Ddifferent and Dsame folders are all correct patches. Different and Same only indicate whether the patch is syntactically identical to the ground truth patch. Patches generated for Mockito project (2 in total): Kali-A-Mockito-10; Arja-Mockito-10 Patches do not pass plausibility check (6 in total): Kali-Closure-133; kPAR-Chart-12; FixMiner-Chart-12; patch1-Lang-6-SketchFix-plausible; patch2-Lang-6-SketchFix-plausible; patch1-Math-2-SOFix Patches that are mistakenly labeled (12 in total): patch2-Lang-51-Jaid; patch1-Lang-43-CapGen; patch2-Lang-43-CapGen; patch2-Math-53-CapGen; patch2-Math-53-Jaid; jKali-Lang-7; ACS-Lang-35; Arja-Math-35; SimFix-Math-72; SimFix-Closure-19; Arja-Math-50; SimFix-Lang-60 Detailed reasons for the mislabeled patches: 1. the ground-truth patch modifies multiple locations while the generated patch only modifies one of them (2/12, SimFix-Math-72, SimFix-Lang-60); 2. the edit points in the generated patch are different from those in ground-truth patch (8/12, patch2-Lang-51-Jaid, patch2-Math-53-Jaid, patch1-Lang-43-CapGen, patch2-Lang-43-CapGen, patch2-Math-53-CapGen, ACS-Lang-35, SimFix-Closure-19, Arja-Math-50); 3. the generated patch doesnot fulfill the intended function in ground-truth (2/12, jKali-Lang-7, Arja-Math-35). Take Arja-Math-50 as an example, this patch deletes a conditional statement which deals with an unexpected input (null) in the method verifyBracketing. However, in the oracle program, this conditional statement still exists. Then, Randoop generated a test case by calling verifyBracketing with a null argument. This test passed on the ground-truthpatch but failed on the patch generated by Arja due to the removeof the exception handling statements. As a result, this patch is actually overfitting but mistakenly labeled as correct. We have confirmed this case with Kui Liu, the first author of the recent ICSE'20 paper (Title: On the Efficiency of Test Suite based Program Repair) which makes up our patch benchmark. Border line Patches (3 in total): ACS-Lang-7; kPAR-Lang-7; TBar-Lang-7. Reasons for overfitting: Evosuite generates some tests that fail on those patches, e.g., test049 in Seed 1; the Java documentation above the function states that it needs to deal with the situation where the input cannot be converted. Reasons for correct: it synthesizes the correct modification; currently, in the program, createBigDecimal() is not called directly in other part of the production code except createNumber() and the test code. In our paper, we consider these three patches as correct and that's why Evosuite has 3 false positives.

Related Organizations
Keywords

patch correctness assessment, automated program repair

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 58
    download downloads 37
  • 58
    views
    37
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
58
37