
Duplicate code is an example of bad smells, which are usually been refactored after the detection to improve the quality of programs. Locate the duplicate code at the programming phase may reduce the cost of maintenance, but the challenge is it need to detect duplicate code between an incomplete code fragment with complete files, which the existing tools are hard to be applied to this scenario. In this paper, we propose an AST-sequence-based duplicate code detection approach for onsite programming code. The abstract syntax tree (AST) is extracted from source code and then is transformed into an encoded sequence. A local sequence alignment algorithm is used to find highly similar subsequences. After the post-processing, similar regions will be found between two code fragments according to the subsequences. We have developed a prototype tool as a plugin for Visual Studio Code. Experimental results indicate that our approach is effective in finding highly similar regions between cross-granularity code fragments, which can facilitate duplicate code detection for incomplete onsite programming code.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
