
Program similarity analysis had a wide range of applications in areas such as code plagiarism and property protection, but it generally suffered from problems such as excessive computational overhead, a code similarity analysis method based on fuzzy matching and statistical inference was proposed. For binary programs, first disassembly analysis was performed and then function boundary recognition operations was performed to extract the execution boundary information of the function. On this basis, dynamic programming analysis methods were used to obtain similarity results between basic blocks at the granularity of the basic blocks, and neighborhood search was performed on the basis of the control flow graph to extend similarity analysis from the basic block level to the function level. Finally, the semantic similarity of binary files was obtained through statistical analysis of similarity functions. During this process, the pre trained model was optimized and analyzed, and the parameters were tuned to enable similarity analysis of cross platform code. The experimental results show that the proposed method has a significant improvement in analysis accuracy compared to traditional analysis tools, with an average increase of 7.1% in analysis accuracy compared to current mainstream analysis tools.
machine learning, fuzzy matching, program analysis, Telecommunication, TK5101-6720, statistical inference
machine learning, fuzzy matching, program analysis, Telecommunication, TK5101-6720, statistical inference
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
