SECON: Maintaining Semantic Consistency in Data Augmentation for Code Search

descriptionPublicationkeyboard_double_arrow_right Article 16 Jan 2025 English Publisher:Association for Computing Machinery (ACM)Journal:ACM Transactions on Information Systems, volume 43, pages 1-26 (issn: 1046-8188, eissn: 1558-2868,

Copyright policy )

Authors: Xu Zhang 0053; Zexu Lin; Xiaoyu Hu; Jianlei Wang; Wenpeng Lu; Deyu Zhou;

doi: 10.1145/3686151

SECON: Maintaining Semantic Consistency in Data Augmentation for Code Search

- Summary
- Metrics

Abstract

Efficient code search techniques are crucial in accelerating software development by aiding developers in locating specific code snippets and understanding code functionalities. This study investigates code search methodologies, focusing on the emerging significance of semantic consistency in data augmentation techniques. While existing approaches predominantly enhance raw data, often requiring additional preprocessing and incurring higher training costs, this research introduces a pioneering method operating at the code and query representation levels. By bypassing the need for extensive data processing, this novel approach fosters an interactive alignment between code and query, augmenting the semantic coherence crucial for effective code search. An extensive empirical evaluation of a diverse dataset across multiple programming languages substantiates the efficacy of this approach in significantly enhancing code search model performance compared to traditional methodologies. The implementation is publicly available on GitHub, 1 offering an accessible resource for further exploration and application.

Related Organizations

Qilu University of Technology
China (People's Republic of)
Southeast University
China (People's Republic of)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%