TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction

Name: TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction
Keywords: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)

Kuan-Hao Huang; I-Hung Hsu; Tanmay Parekh; Zhiyu Xie 0001; Zixuan Zhang; Prem Natarajan; Kai-Wei Chang; Nanyun Peng 0001; Heng Ji 0001

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2023

Data sources: arXiv.org e-Print Archive

https://doi.org/10.18653/v1/20...

Article . 2024 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2023

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Conference object

Data sources: DBLP

TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2024Embargo end date: 01 Jan 2023Publisher:Association for Computational Linguistics (ACL)Journal:Findings of the Association for Computational Linguistics ACL 2024

Authors: Kuan-Hao Huang; I-Hung Hsu; Tanmay Parekh; Zhiyu Xie 0001; Zixuan Zhang; Prem Natarajan; Kai-Wei Chang; +2 Authors

doi: 10.18653/v1/2024.findings-acl.760 , 10.48550/arxiv.2311.09562

arXiv: 2311.09562

TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction

- Summary
- Subjects
- Related research
  (11)
- Metrics

Abstract

Event extraction has gained considerable interest due to its wide-ranging applications. However, recent studies draw attention to evaluation issues, suggesting that reported scores may not accurately reflect the true performance. In this work, we identify and address evaluation challenges, including inconsistency due to varying data assumptions or preprocessing steps, the insufficiency of current evaluation frameworks that may introduce dataset or data split bias, and the low reproducibility of some previous approaches. To address these challenges, we present TextEE, a standardized, fair, and reproducible benchmark for event extraction. TextEE comprises standardized data preprocessing scripts and splits for 16 datasets spanning eight diverse domains and includes 14 recent methodologies, conducting a comprehensive benchmark reevaluation. We also evaluate five varied large language models on our TextEE benchmark and demonstrate how they struggle to achieve satisfactory performance. Inspired by our reevaluation results and findings, we discuss the role of event extraction in the current NLP era, as well as future challenges and insights derived from TextEE. We believe TextEE, the first standardized comprehensive benchmarking tool, will significantly facilitate future event extraction research.

Paper accepted by ACL 2024 Findings

Related Organizations

University of California System
United States
University of Illinois Urbana-Champagne
United States
University of Illinois at Urbana Champaign
United States
University of California, San Francisco
United States
University of Illinois at Urbana–Champaign
United States

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)

11 Research products, page 1 of 2

chevron_left
1
2
chevron_right

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%