Comparative Analysis of Reasoning Accuracy in Multimodal Large Language Models and Diffusion-Based Trajectory Policies on

SOVEREIGN Research Kernel

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

Comparative Analysis of Reasoning Accuracy in Multimodal Large Language Models and Diffusion-Based Trajectory Policies on

descriptionPublicationkeyboard_double_arrow_right Report Under curation English Publisher:Zenodo

Authors: SOVEREIGN Research Kernel;

doi: 10.5281/zenodo.20660667

Comparative Analysis of Reasoning Accuracy in Multimodal Large Language Models and Diffusion-Based Trajectory Policies on

- Summary

Abstract

Large Language Models (LLM) with reasoning capabilities offer a promising path for improving candidate evaluation in planning frameworks, but their relative performance against traditional non-reasoning models remains largely underexplored. In this study, we benchmark a distilled 1.5B parameter reasoning model (DeepSeek-R1) against several state-of-the-art non-reasoning LLMs within a generator-discriminator LLM planning framework for the text-to-SQL task. For this, we introduce a novel method for extracting soft scores from the chain-of-thought (CoT) outputs from reasoning that enables fine-grResearch goal: How does the reasoning accuracy of multimodal large language models compare to diffusion-based trajectory policies in dynamic task planning environments when evaluated on the RoboBench benchmark with varying levels of environmental noise?Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.6/10.

Found an issue? Give us feedback