CompBioBench v1: A benchmark of 100 diverse, verifiable questions for agents for computational biology

Nair, Surag; Gunsalus, Laura; Orcutt-Jahns, Brian; Rossen, Jordan; Lal, Avantika; De Donno, Carlo; Celik, Muhammed Hasan; Fletez-Brant, Kipper; Xie, Xiaoman; Corrada Bravo, Hector; Eraslan, Gokcen

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Dataset

Data sources: ZENODO

CompBioBench v1: A benchmark of 100 diverse, verifiable questions for agents for computational biology

Research datakeyboard_double_arrow_right Dataset Under curationPublisher:Zenodo

Authors: Nair, Surag; Gunsalus, Laura; Orcutt-Jahns, Brian; Rossen, Jordan; Lal, Avantika; De Donno, Carlo; Celik, Muhammed Hasan; +4 Authors

doi: 10.5281/zenodo.19443186

CompBioBench v1: A benchmark of 100 diverse, verifiable questions for agents for computational biology

- Summary

Abstract

We introduce CompBioBench v1, a benchmark of 100 diverse tasks for evaluating agentic systems in computational biology. Unlike mathematics and programming, which more readily admit systematic verification, biological data are inherently noisy and open to interpretation. To enable objective evaluation without reducing tasks to prescriptive checklists, we propose a new benchmark-construction strategy based on synthetic/augmented data and metadata scrambling/scrubbing of real datasets to create challenging problems with a single ground-truth answer that require multi-step reasoning, tool use, bespoke code, and interaction with real-world external resources. The benchmark spans genomics, transcriptomics, epigenomics, single-cell analysis, human genetics, and machine learning workflows. Questions are curated by domain experts to cover a broad range of skills with varying difficulty. This record contains all questions, metadata, and input data files associated with CompBioBench v1.

Found an issue? Give us feedback