Transfer Attacks and Defenses for Large Language Models on Coding Tasks

Name: Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Cryptography and Security (cs.CR), Machine Learning (cs.LG)

Zhang, Chi; Wang, Zifan; Mangal, Ravi; Fredrikson, Matt; Jia, Limin; Pasareanu, Corina

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2023

Data sources: arXiv.org e-Print Archive

https://dx.doi.org/10.48550/ar...

Article . 2023

License: CC BY

Data sources: Datacite

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2023Embargo end date: 01 Jan 2023Publisher:arXiv

Authors: Zhang, Chi; Wang, Zifan; Mangal, Ravi; Fredrikson, Matt; Jia, Limin; Pasareanu, Corina;

doi: 10.48550/arxiv.2311.13445

arXiv: 2311.13445

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

- Summary
- Subjects
- Metrics

Abstract

Modern large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities for coding tasks including writing and reasoning about code. They improve upon previous neural network models of code, such as code2seq or seq2seq, that already demonstrated competitive results when performing tasks such as code summarization and identifying code vulnerabilities. However, these previous code models were shown vulnerable to adversarial examples, i.e. small syntactic perturbations that do not change the program's semantics, such as the inclusion of "dead code" through false conditions or the addition of inconsequential print statements, designed to "fool" the models. LLMs can also be vulnerable to the same adversarial perturbations but a detailed study on this concern has been lacking so far. In this paper we aim to investigate the effect of adversarial perturbations on coding tasks with LLMs. In particular, we study the transferability of adversarial examples, generated through white-box attacks on smaller code models, to LLMs. Furthermore, to make the LLMs more robust against such adversaries without incurring the cost of retraining, we propose prompt-based defenses that involve modifying the prompt to include additional information such as examples of adversarially perturbed code and explicit instructions for reversing adversarial perturbations. Our experiments show that adversarial examples obtained with a smaller code model are indeed transferable, weakening the LLMs' performance. The proposed defenses show promise in improving the model's resilience, paving the way to more robust defensive solutions for LLMs in code-related applications.

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Cryptography and Security (cs.CR), Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green