
This Zenodo repository contains the datasets and models for our paper "R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation". Datasets raw_dataset.json - the raw dataset mined from NVD. r2vul_dataset.zip - the dataset used for fine-tuning and testing. external_java_test.zip - the external, manually annotated Java dataset (RQ2). Models and Checkpoints cls.zip - includes all models checkpoints fine-tuned using CLS. sft.zip - includes all models checkpoints fine-tuned using SFT. orpo.zip - includes all models checkpoints fine-tuned using ORPO (R2Vul). MSIVD.zip - downloaded from https://zenodo.org/records/11403208 (codellama-13b - bigvul_expl). VulLLM.zip - downloaded from https://zenodo.org/records/10677069 (codellama-13b-multi-r16-2048).
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
