The potential of ChatGPT in translation evaluation: A case study of the Chinese-Portuguese machine translation

Name: The potential of ChatGPT in translation evaluation: A case study of the Chinese-Portuguese machine translation
Keywords: automatic scoring, ChatGPT, Translating and interpreting, P306-310, machine translation (MT), evaluation metric, human assessment

Lili Jiang; Yunxiao Jiang; Lili Han

Found an issue? Give us feedback

Cadernos de Traduçãoarrow_drop_down

Cadernos de Tradução

Article . 2024 . Peer-reviewed

License: CC BY

Data sources: Crossref

Cadernos de Tradução

Article . 2024

Data sources: DOAJ

The potential of ChatGPT in translation evaluation: A case study of the Chinese-Portuguese machine translation

descriptionPublicationkeyboard_double_arrow_right Article 09 Oct 2024Publisher:Universidade Federal de Santa Catarina (UFSC)Journal:Cadernos de Tradução, volume 44, pages 1-22 (issn: 1414-526X, eissn: 2175-7968,

Copyright policy )

Authors: Lili Jiang; Yunxiao Jiang; Lili Han;

doi: 10.5007/2175-7968.2024.e98613

The potential of ChatGPT in translation evaluation: A case study of the Chinese-Portuguese machine translation

- Summary
- Subjects
- Metrics

Abstract

The integration of artificial intelligence (AI) in translation assessment represents a significant evolution in the field, transcending traditional human-only scoring approaches. This study specifically examines the role of ChatGPT, a multilingual, transformer-based large language model developed by OpenAI, in the automated evaluation of machine translations between Portuguese and Mandarin. Despite ChatGPT's burgeoning reputation for its advanced Natural Language Processing (NLP) capabilities, research on its application in translation evaluation, particularly for this language pair, remains unexplored. To fill this gap, our research employed three prevalent machine translation tools to translate a set of twenty sentences from Chinese into Portuguese. Translated target text versions provided by professional Chinese-Portuguese translators were also included to estimate if the machine-translated target texts have achieved a certain degree of human parity. We then assessed these translations using both GPT models (ChatGPT 3.5 and ChatGPT 4.0) and five human raters to offer a comprehensive scoring analysis. The study's findings reveal that, particularly ChatGPT 4.0, exhibits substantial promise in evaluating translations across varied text types. However, this potential is tempered by notable inconsistencies and limitations in its performance. Through both quantitative analysis and qualitative insights, this research highlights the synergy between ChatGPT's automated scoring and traditional human assessment. It uncovers some key benefits of this automated approach: (1) exploring viability of automated translation evaluation, particularly in Chinese-Portuguese language pair; (2) fostering critical supplement to human evaluation, and (3) uncovering the astonishing capability of cutting-edge machine translation tools in Chinese-Portuguese language pair. Our findings contribute to a more detailed comprehension of ChatGPT's role in translation assessment and underscore the need for a balanced approach that leverages both human expertise and AI capabilities.

Keywords

automatic scoring, ChatGPT, Translating and interpreting, P306-310, machine translation (MT), evaluation metric, human assessment

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Top 10%

Average

gold