GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning

Name: GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning
Keywords: Machine Learning, FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Multimedia, Artificial Intelligence, Machine Learning (cs.LG), Multimedia (cs.MM)

Abhigya Verma; Sriram Puttagunta; Seganrasan Subramanian; Sravan Ramachandran

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://dx.doi.org/10.48550/ar...

Article . 2025

License: CC BY

Data sources: Datacite

DBLP

Article

Data sources: DBLP

GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2025Embargo end date: 01 Jan 2025Publisher:arXivJournal:CoRR, volume abs/2508.15690

Authors: Abhigya Verma; Sriram Puttagunta; Seganrasan Subramanian; Sravan Ramachandran;

doi: 10.48550/arxiv.2508.15690

arXiv: 2508.15690

GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning

- Summary
- Subjects
- Metrics

Abstract

GRAFT is a structured multimodal benchmark designed to probe how well LLMs handle instruction following, visual reasoning, and tasks requiring tight visual textual alignment. The dataset is built around programmatically generated charts and synthetically rendered tables, each paired with a carefully constructed, multi step analytical question that depends solely on what can be inferred from the image itself. Responses are formatted in structured outputs such as JSON or YAML, enabling consistent and fine grained evaluation of both reasoning processes and adherence to output specifications. The benchmark further introduces a taxonomy of reasoning operations ranging from comparison and trend identification to ranking, aggregation, proportional estimation, and anomaly detection to support a comprehensive assessment of model capabilities. Taken together, GRAFT provides a unified and scalable framework for evaluating multimodal LLMs on visually grounded, structured reasoning tasks, offering a more rigorous standard for future benchmarking efforts.

25 pages, 10 tables, 3 figures

Keywords

Machine Learning, FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Multimedia, Artificial Intelligence, Machine Learning (cs.LG), Multimedia (cs.MM)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

Knowmad Institut