
Artifact for the paper: "Prompt Obfuscation for Large Language Models" This artifact contains the code to reproduce the results presented in the paper. The code allows users to perform and evaluate prompt obfuscation as an alternative method to traditional system prompting for large language models. Furthermore, different deobfuscation methods are included. The main components to reproduce the results of the paper are the following groups of scripts: obfuscate.py and evaluate_obfuscation.py to obfuscate system prompts and evaluate them (Section 5.1, 5.2, 5.2 results). finetuning.py and evaluate_finetuning.py to finetune LoRa adapters and evaluate them (Section 5.4 results). prompt_extraction.py and evaluate_prompt_extraction.py to extract the system prompt and evaluate the success rate (Section 6.1 results). projection.py to project embedded (soft) prompts back to token space (Section 6.2 results). fluency_deobfuscation.py and evaluate_fluency_deobfuscation.py to deobfuscate obfuscated system prompts using fluency optimization and evaluate them (Section 6.3 results). Several helper scripts (generate_output.py, compare_output.py, compare_sys_prompts.py) to quickly generate and compare output and to compare system prompts for evaluation and baseline comparisons. Project Structure prompt_obfuscation├── README.md├── compare_output.py├── compare_sys_prompts.py├── data│ ├── __init__.py│ ├── config.py│ ├── loader.py│ └── utils.py├── evaluate_finetuning.py├── evaluate_fluency_deobfuscation.py├── evaluate_obfuscation.py├── evaluate_prompt_extraction.py├── extraction_prompts│ └── gpt4_generated.json├── finetune.py├── fluency_deobfuscation.py├── generate_output.py├── obfuscate.py├── projection.py├── prompt_extraction.py├── requirements.txt└── src ├── __init__.py ├── finetuning_utils.py ├── logging_config.py ├── model.py ├── output_generation.py ├── output_similarity.py ├── prompt_utils.py ├── style_prompts.py ├── sys_prompt_similarity.py └── utils.py The data/ directory handles dataset loading and processing. The src/ directory contains core logic for models, generation, and evaluation. The extraction_prompts/ directory contains extraction prompts for the prompt extraction attack. The python scripts in the root directory are used to run the experiments. Setup A GPU is highly recommended for reasonable computation times. Create a Python 3.12.7 environment (e.g., using conda): conda create -n prompt_obfuscation python=3.12.7conda activate prompt_obfuscation Install the required packages:pip install -r requirements.txt Hugging Face Access: The main model used (Llama-3.1-8B) requires a Hugging Face account with access granted to the model. Log in via the command line after requesting access on the model's page:huggingface-cli login Please see the README.md file for example usage and a full list of all command-line arguments.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
