Prompt Alchemy: Automatic Prompt Refinement for Enhancing Code Generation

Name: Prompt Alchemy: Automatic Prompt Refinement for Enhancing Code Generation
Keywords: Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering

Sixiang Ye; Zeyu Sun; Guoqing Wang; Liwei Guo; Qingyuan Liang; Zheng Li; Yong Liu

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

IEEE Transactions on Software Engineering

Article . 2025 . Peer-reviewed

License: IEEE Copyright

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2025

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Article

Data sources: DBLP

Prompt Alchemy: Automatic Prompt Refinement for Enhancing Code Generation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Sep 2025Embargo end date: 01 Jan 2025Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Software Engineering, volume 51, pages 2,472-2,493 (issn: 0098-5589, eissn: 2326-3881,

Copyright policy )

Authors: Sixiang Ye; Zeyu Sun; Guoqing Wang; Liwei Guo; Qingyuan Liang; Zheng Li; Yong Liu;

doi: 10.1109/tse.2025.3589634 , 10.48550/arxiv.2503.11085

arXiv: 2503.11085

Prompt Alchemy: Automatic Prompt Refinement for Enhancing Code Generation

- Summary
- Subjects
- Metrics

Abstract

Code generation has emerged as a key task to automate software development by converting high-level descriptions into executable code. Large language models (LLMs) excel at this but depend heavily on input prompt quality.Manual prompt engineering can be time-consuming and inconsistent, limiting LLM effectiveness. This paper introduces Prochemy, an innovative method for automatically refining prompts to boost code generation. Prochemy overcomes manual prompt limitations by automating optimization, ensuring consistency during inference, and supporting multi-agent systems.It iteratively refines prompts based on model performance, using an optimized final prompt for improved consistency across tasks. We tested Prochemy on natural language-based code generation and translation tasks using three LLM series. Results indicate Prochemy enhances existing methods, improving performance by 5.0% for GPT-3.5-Turbo and 1.9% for GPT-4o over zero-shot baselines on HumanEval. In state-of-the-art LDB, Prochemy + LDB surpasses standalone methods by 1.2-1.8%. For code translation, Prochemy boosts GPT-4o's Java-to-Python (AVATAR) performance from 74.5 to 84.1 (+12.9%) and Python-to-Java from 66.8 to 78.2 (+17.1%). Moreover, Prochemy maintains strong performance when integrated with the o1-mini model, validating its efficacy in code tasks. Designed as plug-and-play, Prochemy optimizes prompts with minimal human input, bridging the gap between simple prompts and complex frameworks.

Related Organizations

Chinese Academy of Sciences
China (People's Republic of)
Peking University
China (People's Republic of)
Institute of Software
China (People's Republic of)
Beijing University of Chemical Technology
China (People's Republic of)

Keywords

Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Top 10%

Average

Green