Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Artifact of LLM-Guided Genetic Improvement: Envisioning Semantic Aware Automated Software Evolution

Authors: Even Mendoza, Karine; Brownlee, Alexander; Geiger, Alina; Hanna, Carol; Petke, Justyna; Sarro, Federica; Sobania, Dominik;

Artifact of LLM-Guided Genetic Improvement: Envisioning Semantic Aware Automated Software Evolution

Abstract

Abstract: we propose the investigation of a new research line on AI-powered GI aimed at incorporating semantic aware search. We take a first step at it by augmenting GI with the use of automated clustering of LLM edits. We provide initial empirical evidence that our proposal, dubbed PatchCat, allows us to automatically and effectively categorize LLM-suggested patches. PatchCat identified 18 different types of software patches and categorized newly suggested patches with high accuracy. It also enabled detecting NoOp edits in advance and, prospectively, to skip test suite execution to save resources in many cases. These results, coupled with the fact that PatchCat works with small, local LLMs, are a promising step toward interpretable, efficient, and green GI. Note: This is the artifact of ASE NIER 2025 publication. Examples for RQ2 (full example text): Reasons for inconsistency in tagging: Case of Different prioritization by the model and humans: The model prioritizes importance differently than humans, e.g., when an entry was tagged Category #12 by the model and #9 by a human, who added a note: ”12 could also be considered but 9 is more important. Also makes changes to the return, but not mentioned in the description”, for the following shot 15-word description generated via LLM: "A Java code diff with 4 changes: catches ParseException, adds variable, and updates logic". Case of Incomplete or unclear summaries:Issues with the short 15-word description, such as not describing all changes or an unclear summary. For example, when the "if" statement was modified: 125d124 final Locale locale = Locale.ENGLISH; > final SimpleDateFormat format = new SimpleDateFormat(pattern, locale); > // assume no header date by default > boolean hasHeaderDate = false; 129a132 > hasHeaderDate = true; 133a137,140 > if (hasHeaderDate) { > // add a newline after the date field > header.append(""\n""); > } but this was not clear from the summary: "SimpleDateFormat constructor and locale usage changed, with additional logic for header date detection". Content of Files in the Artifact: Datasets: Raw Data is taken from here: https://zenodo.org/records/13381774. Initial Manual Clustering: clustering of 309 entries from JUnit4 and JCodec projects, with LLM patches generated with Mistral LLM. Size of dataset: 309. File: Patch Analysis-anon.xlsx. Augmented Dataset: The initial dataset was manually clustered after data augmentation. Size of dataset: 5806 (unique). File: DataAugmentation_Approach_Patch Classification_subsection.xlsx Validation (RQ1): Validation on unseen datasets (unseen projects, and/or unseen LLM-generated patches model).Size of dataset: 218. File: RQ2-dataset-all_patch_summaries.xlsx Statistics (RQ2): Data used to construct statistics of LLM-generated patches in Gin from ForArtifact.zip.Size of dataset: 3232. File: ForArtifact.zip Dockers: The model: the model is built via the offline approach to be used in the online approach in a Docker file, ready to test and use. File: model-in-a-docker-unseen−retrives−batch.tar. Code: Clustering is taken from here: https://github.com/rashadulrakib/short-text-clustering-enhancement, but applied to a new dataset. Clustering scripts, developed on top of the short-text-clustering work. File: clustering.zip. Code of RQ1 is in ForArtifact.zip.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities