BOOKCOREF: Coreference Resolution at Book Scale

Name: BOOKCOREF: Coreference Resolution at Book Scale
Keywords: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), coreference resolution, information extraction, document-level extraction, corpus creation, benchmarking, long-document, book-scale, Artificial Intelligence, Computation and Language, Computation and Language (cs.CL)

Giuliano Martinelli; Tommaso Bonomo; ‪Pere-Lluís Huguet Cabot; Roberto Navigli

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Conference object . 2025

Data sources: Archivio della ricerca- Università di Roma La Sapienza

https://doi.org/10.18653/v1/20...

Article . 2025 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2025

License: CC BY NC SA

Data sources: Datacite

DBLP

Article

Data sources: DBLP

DBLP

Conference object

Data sources: DBLP

BOOKCOREF: Coreference Resolution at Book Scale

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2025Embargo end date: 01 Jan 2025 Italy Publisher:Association for Computational Linguistics (ACL)Journal:Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Authors: Giuliano Martinelli; Tommaso Bonomo; ‪Pere-Lluís Huguet Cabot; Roberto Navigli;

doi: 10.18653/v1/2025.acl-long.1197 , 10.48550/arxiv.2507.12075

arXiv: 2507.12075

handle: 11573/1744986

BOOKCOREF: Coreference Resolution at Book Scale

- Summary
- Subjects
- Metrics

Abstract

Coreference Resolution systems are typically evaluated on benchmarks containing small- to medium-scale documents. When it comes to evaluating long texts, however, existing benchmarks, such as LitBank, remain limited in length and do not adequately assess system capabilities at the book scale, i.e., when co-referring mentions span hundreds of thousands of tokens. To fill this gap, we first put forward a novel automatic pipeline that produces high-quality Coreference Resolution annotations on full narrative texts. Then, we adopt this pipeline to create the first book-scale coreference benchmark, BOOKCOREF, with an average document length of more than 200,000 tokens. We carry out a series of experiments showing the robustness of our automatic procedure and demonstrating the value of our resource, which enables current long-document coreference systems to gain up to +20 CoNLL-F1 points when evaluated on full books. Moreover, we report on the new challenges introduced by this unprecedented book-scale setting, highlighting that current models fail to deliver the same performance they achieve on smaller documents. We release our data and code to encourage research and development of new book-scale Coreference Resolution systems at https://github.com/sapienzanlp/bookcoref.

Accepted to ACL 2025 Main Conference. 19 pages

Country

Italy

Related Organizations

Sapienza University of Rome
Italy

Keywords

FOS: Computer and information sciences, Artificial Intelligence (cs.AI), coreference resolution, information extraction, document-level extraction, corpus creation, benchmarking, long-document, book-scale, Artificial Intelligence, Computation and Language, Computation and Language (cs.CL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

Knowmad Institut