Method of Code Features Automated Extraction

SHI Zhicheng, ZHOU Yu

Found an issue? Give us feedback

Jisuanji kexue yu ta...arrow_drop_down

Jisuanji kexue yu tansuo

Article . 2021

Data sources: DOAJ

Method of Code Features Automated Extraction

descriptionPublicationkeyboard_double_arrow_right Article 01 Mar 2021 Chinese Publisher:Journal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJournal:Jisuanji kexue yu tansuo (issn: 1673-9418,

Copyright policy )

Authors: SHI Zhicheng, ZHOU Yu;

Method of Code Features Automated Extraction

- Summary
- Subjects
- Metrics

Abstract

The application of neural networks in software engineering has greatly eased the pressure of traditional method of extracting code features manually. Previous code feature extraction models usually regard code as natural language or heavily depend on the domain knowledge of experts. The method of transferring code into natural language is too simple and can easily cause information loss. However, the model with heuristic rules designed by experts is usually too complicated and lacks of expansibility and generalization. In regard of the problems above, this paper proposes a model based on convolutional neural network and recurrent neural network to extract code features through abstract syntax tree (AST). To solve the problem of gradient vanishing caused by the huge size of AST, this paper splits the AST into a sequence of small ASTs and then feeds these trees into the model. The model uses convolutional neural network and recurrent neural network to extract structure information and sequence information respectively. The whole procedure doesn't need to introduce the domain knowledge of experts to guide the model training and the model will automatically learn how to extract features through the codes which have been labeled classification. This paper uses the task of similar code search to test the performance of the trained encoder, the metric of Top1, NDCG and MRR is 0.560, 0.679 and 0.638 respectively. Compared with recent state-of-the-art feature extraction deep learning models and common similar code detection tools, the proposed model has significant advantages.

Keywords

code feature extraction, program comprehension, code classification, Electronic computers. Computer science, similar code search, QA75.5-76.95

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold