
Abstract: We investigate Group Relative Policy Optimization (GRPO) for radiology report summarization using Qwen 3.0 models. GRPO enables optimization of composite reward functions combining syntactic and semantic measures, addressing limitations of traditional supervised fine-tuning. Our comprehensive evaluation on MIMIC-III demonstrates that GRPO consistently outperforms baseline and supervised fine-tuning approaches across multiple metrics including ROUGE-L and F1-RadGraph.
