
JailFact-Bench is a curated benchmark dataset for analyzing jailbreak attacks and hallucination patterns in Large Language Models (LLMs). It contains semantically aligned jailbreak and factuality prompts, along with metadata including toxicity shifts, similarity scores, and annotation strategies. Developed as part of a capstone research project at NYU Abu Dhabi under Professor Christina Pöpper, this dataset accompanies the paper accepted at the SiMLA 2025 Workshop, co-located with the 23rd International Conference on Applied Cryptography and Network Security (ACNS).
Large Language Models, Factuality Analysis, LLM Hallucination, Jailbreak Attack Prompts, AI Safety, Jailbreak Attacks
Large Language Models, Factuality Analysis, LLM Hallucination, Jailbreak Attack Prompts, AI Safety, Jailbreak Attacks
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
