
Hateful memes have become a significant concern on the Internet, necessitating robust automated detection systems. While Large Multimodal Models (LMMs) have shown promise in hateful meme detection, they face notable challenges like sub-optimal performance and limited out-of-domain generalization capabilities. Recent studies further reveal the limitations of both supervised fine-tuning (SFT) and in-context learning when applied to LMMs in this setting. To address these issues, we propose a robust adaptation framework for hateful meme detection that enhances in-domain accuracy and cross-domain gResearch goal: To what extent does retrieval-augmented generation improve out-of-domain generalization accuracy for large multimodal models compared to supervised fine-tuning on hateful meme detection tasks?Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 7.8/10.
