SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

Name: SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Digital Libraries, Digital Libraries (cs.DL), Computation and Language (cs.CL), Information Retrieval (cs.IR), Computer Science - Information Retrieval, Machine Learning (cs.LG)

Chuan Qin; Xin Chen; Chengrui Wang; Pengmin Wu; Xi Chen; Yihang Cheng; Jingyi Zhao; Meng Xiao; Xiangchao Dong; Qingqing Long; Boya Pan; Han Wu; Chengzan Li; Yuanchun Zhou; Hui Xiong; Hengshu Zhu

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1145/371189...

Article . 2025 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2025

License: CC BY SA

Data sources: Datacite

DBLP

Article

Data sources: DBLP

SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 03 Aug 2025Embargo end date: 01 Jan 2025Publisher:ACMJournal:Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2

Authors: Chuan Qin; Xin Chen; Chengrui Wang; Pengmin Wu; Xi Chen; Yihang Cheng; Jingyi Zhao; +9 Authors

doi: 10.1145/3711896.3737403 , 10.48550/arxiv.2503.13503

arXiv: 2503.13503

SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

- Summary
- Subjects
- Metrics

Abstract

In recent years, the rapid advancement of Artificial Intelligence (AI) technologies, particularly Large Language Models (LLMs), has revolutionized the paradigm of scientific discovery, establishing AI-for-Science (AI4Science) as a dynamic and evolving field. However, there is still a lack of an effective framework for the overall assessment of AI4Science, particularly from a holistic perspective on data quality and model capability. Therefore, in this study, we propose SciHorizon, a comprehensive assessment framework designed to benchmark the readiness of AI4Science from both scientific data and LLM perspectives. First, we introduce a generalizable framework for assessing AI-ready scientific data, encompassing four key dimensions: Quality, FAIRness, Explainability, and Compliance-which are subdivided into 15 sub-dimensions. Drawing on data resource papers published between 2018 and 2023 in peer-reviewed journals, we present recommendation lists of AI-ready datasets for Earth, Life, and Materials Sciences, making a novel and original contribution to the field. Concurrently, to assess the capabilities of LLMs across multiple scientific disciplines, we establish 16 assessment dimensions based on five core indicators Knowledge, Understanding, Reasoning, Multimodality, and Values spanning Mathematics, Physics, Chemistry, Life Sciences, and Earth and Space Sciences. Using the developed benchmark datasets, we have conducted a comprehensive evaluation of over 50 representative open-source and closed source LLMs. All the results are publicly available and can be accessed online at www.scihorizon.cn/en.

Related Organizations

Hefei University of Technology
China (People's Republic of)
Chinese Academy of Sciences
China (People's Republic of)
Computer Network Information Center
China (People's Republic of)
University of Science and Technology of China
China (People's Republic of)
University of Chinese Academy of Sciences
China (People's Republic of)

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Digital Libraries, Digital Libraries (cs.DL), Computation and Language (cs.CL), Information Retrieval (cs.IR), Computer Science - Information Retrieval, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

3

Top 10%

Average

Green