Foundation-Sec-8B-Reasoning Accuracy Under RLVR Across Programming Languages in Big-Vul

SOVEREIGN Research Kernel

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

Foundation-Sec-8B-Reasoning Accuracy Under RLVR Across Programming Languages in Big-Vul

descriptionPublicationkeyboard_double_arrow_right Report Under curation English Publisher:Zenodo

Authors: SOVEREIGN Research Kernel;

doi: 10.5281/zenodo.20637511

Foundation-Sec-8B-Reasoning Accuracy Under RLVR Across Programming Languages in Big-Vul

- Summary

Abstract

Visual question answering increasingly requires multi-step reasoning. Recent post-training with reinforcement learning under verifiable rewards (RLVR) and Group Relative Policy Optimization (GRPO) can improve multimodal reasoning, but most approaches rely on sparse outcome-only rewards. As a result, they struggle to tell whether an incorrect answer comes from a small mistake late in the reasoning or from an unhelpful trajectory from the start. A common solution is to train a process reward model (PRM) for step-level supervision, but this typically requires large-scale high-quality chain-of-thoResearch goal: What is the impact of reinforcement learning from verifiable rewards (RLVR) on the accuracy of Foundation-Sec-8B-Reasoning in reasoning-based security tasks across different programming languages in the Big-Vul benchmark?Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.5/10.

Found an issue? Give us feedback