DPO and SFT Comparison in LLM Counter-Speech Argumentation Across Languages

The automatic generation of counter-speech (CS) is a critical strategy for addressing hate speech by providing constructive and informed responses. However, existing methods often fail to generate high-quality, impactful, and scalable CS, particularly across diverse linguistic contexts. In this paper, we propose a novel methodology to enhance CS generation by aligning Large Language Models (LLMs) using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Our approach leverages DPO to align LLM outputs with human preferences, ensuring contextually appropriate and linguisticallResearch goal: What is the impact of DPO versus SFT on the argumentative strength metrics of LLM-generated counter-speech across diverse linguistic contexts in alignment evaluations?Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 8.3/10.

Found an issue? Give us feedback