
With the rise of AI-generated code, programming courses face new challenges in detecting code plagiarism. Traditional methods struggle against obfuscation techniques that modify code structure through statement insertion and deletion. To address this, we propose a novel approach based on tolerant token matching designed to enhance resilience against such attacks.We evaluate our method through three experiments on a real-life dataset with AI-obfuscated plagiarisms. The results show that our approach increased the median similarity gap between originals and plagiarisms by 1 to 6 percentage points.
ddc:004, Tokenization, Plagiarism Obfuscation, Computer Science Education, DATA processing & computer science, Software Plagiarism Detection, Source Code Plagiarism Detection, Obfuscation Attacks, info:eu-repo/classification/ddc/004, Code Normalization
ddc:004, Tokenization, Plagiarism Obfuscation, Computer Science Education, DATA processing & computer science, Software Plagiarism Detection, Source Code Plagiarism Detection, Obfuscation Attacks, info:eu-repo/classification/ddc/004, Code Normalization
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
