Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?

Name: Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?
Keywords: Computational linguistics. Natural language processing, P98-98.5

Agrawal, Sweta; Farajian, Amin; Fernandes, Patrick; Rei, Ricardo; Martins, André F. T.

Found an issue? Give us feedback

downloadFull-Text

Transactions of the ...arrow_drop_down

Transactions of the Association for Computational Linguistics

Article

License: CC BY

Full-Text: https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00700/2473652/tacl_a_00700.pdf

Data sources: Sygma

Transactions of the Association for Computational Linguistics

Article . 2024 . Peer-reviewed

License: CC BY

Data sources: Crossref

Transactions of the Association for Computational Linguistics

Article . 2024

Data sources: DOAJ

Transactions of the Association for Computational Linguistics

Article . 2024 . Peer-reviewed

Data sources: European Union Open Data Portal

DBLP

Article

Data sources: DBLP

Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2024 English Publisher:MIT PressJournal:Transactions of the Association for Computational Linguistics, volume 12, pages 1,250-1,267 (eissn: 2307-387X,

Copyright policy )Funded by:EC | UTTER, EC | DECOLLAGE

Authors: Agrawal, Sweta; Farajian, Amin; Fernandes, Patrick; Rei, Ricardo; Martins, André F. T.;

doi: 10.1162/tacl_a_00700

Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

AbstractDespite the recent success of automatic metrics for assessing translation quality, their application in evaluating the quality of machine-translated chats has been limited. Unlike more structured texts like news, chat conversations are often unstructured, short, and heavily reliant on contextual information. This poses questions about the reliability of existing sentence-level metrics in this domain as well as the role of context in assessing the translation quality. Motivated by this, we conduct a meta-evaluation of existing automatic metrics, primarily designed for structured domains such as news, to assess the quality of machine-translated chats. We find that reference-free metrics lag behind reference-based ones, especially when evaluating translation quality in out-of-English settings. We then investigate how incorporating conversational contextual information in these metrics for sentence-level evaluation affects their performance. Our findings show that augmenting neural learned metrics with contextual information helps improve correlation with human judgments in the reference-free scenario and when evaluating translations in out-of-English settings. Finally, we propose a new evaluation metric, Context-MQM, that utilizes bilingual context with a large language model (LLM) and further validate that adding context helps even for LLM-based evaluation metrics.

Related Organizations

Instituto de Telecomunicações
Portugal
INESC-ID
Portugal
UNBABEL UNIPESSOAL, LDA
Portugal
INESC ID - INSTITUTO DE ENGENHARIADE SISTEMAS E COMPUTADORES, INVESTIGACAO E DESENVOLVIMENTO EM LISBOA
Portugal
University of Lisbon
Portugal

View all View all

Keywords

Computational linguistics. Natural language processing, P98-98.5

1 Research products, page 1 of 1

sacrebleu software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold

Funded by

EC| UTTER, EC| DECOLLAGE

Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?

Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?

1 Research products, page 1 of 1

sacrebleu software on GitHub