<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

AGREE: a New Benchmark for the Evaluation of Semantic Models of Ancient Greek

Name: AGREE: a New Benchmark for the Evaluation of Semantic Models of Ancient Greek
Keywords: word embeddings, evaluation, benchmark, human judgements, Ancient Greek, semantics

Research datakeyboard_double_arrow_right Dataset 27 Feb 2023 English Publisher:Zenodo

Authors: Stopponi, Silvia; Peels-Matthey, Saskia; Nissim, Malvina;

doi: 10.5281/zenodo.7681748 , 10.5281/zenodo.7681749 , 10.5281/zenodo.8027490

AGREE: a New Benchmark for the Evaluation of Semantic Models of Ancient Greek

- Summary
- Subjects
- Metrics

Abstract

AGREE (Ancient Greek Relatedness Embeddings Evaluation) is a benchmark for the evaluation of semantic models of Ancient Greek created at the University of Groningen (The Netherlands). More information about it can be found in the following publication: Silvia Stopponi, Saskia Peels-Matthey, Malvina Nissim, AGREE: a new benchmark for the evaluation of distributional semantic models of ancient Greek, Digital Scholarship in the Humanities, Volume 39, Issue 1, April 2024, Pages 373–392, https://doi.org/10.1093/llc/fqad087 1. Overview of the repository This benchmark was created from a mix of expert judgements about relatedness between Ancient Greek words and model outputs validated by human experts. The evaluation items are pairs of Ancient Greek lemmas with a high semantic relatedness. The human judgements were collected via two questionnaires, proposing two different tasks to the experts. The evaluation items included in the AGREE benchmark are a selection of the most strictly related pairs of lemmas obtained from the two tasks. Here an overview of the contents of the repository: 1_agree_task1.json includes all the data collected with the first task. The following labels are used: 'pair': two Ancient Greek lemmas; 'frequency': the number of times that the pair was suggested as related by an expert; 'POS1': part-of-speech of the first lemma; 'POS2': part-of-speech of the second lemma; 'benchmark': inclusion of the pair in the AGREE benchmark ('yes'/'no'). 2_agree_task2.json includes all the data collected with the second task. The following labels are used: 'pair': two Ancient Greek lemmas; 'origin': 'common_pair' = one of the two pairs proposed to all participants in the second task; 'task1' = pairs proposed by experts in the first task; 'models_easy_rel' = output of word2vec models, pair considered as strictly related; 'models_task1' = pairs proposed by experts in the first task and also output by word2vec models; 'models' = output of word2vec language models; 'unrelated' = made up pairs of unrelated lemmas (control pairs); 'respondents': number of experts evaluating a pair; 'score': average relatedness score given by the experts on a 0-100 scale; 'agreement': inter-annotated agreement between all experts who evaluated the block of pairs to which the current pair belongs (when available, i.e. when the block of pairs was presented to more than one participant); 'benchmark': inclusion of the pair in the AGREE benchmark ('yes'/'no'). 3_agree_final_benchmark.json includes the final selection of items that constitutes AGREE. The following labels are used: 'pair': two Ancient Greek lemmas; 'origin': 'task1': pair either proposed more than once in the first task or proposed only once, but scored >= 70 in the second task; 'task2': pair scored by more than one respondent in the second task and with average score >= 70. This updated version of the repository includes the individual answers to the two questionnaires (see files 'answers_Task1_postprocessed.xlsx' and 'raw_answers_Task2.xlsx'). 2. Acknowledgements This work was partially supported by the Young Academy Groningen through the PhD scholarship of Silvia Stopponi. We acknowledge the financial support of Anchoring Innovation. Anchoring Innovation is the Gravitation Grant research agenda of the Dutch National Research School in Classical Studies, OIKOS. It is financially supported by the Dutch ministry of Education, Culture and Science (NWO project number 024.003.012). For more information about the research programme and its results, see the website www.anchoringinnovation.nl. We want to thank the experts of Ancient Greek around the world who shared their knowledge of Ancient Greek semantics and donated some of their precious time. Without them the creation of this benchmark would not have been possible. We also want to thank the many colleagues from the University of Groningen, the National Research School OIKOS, and other Universities abroad who contributed to this work with discussion and advice. 3. CitationSilvia Stopponi, Saskia Peels-Matthey, Malvina Nissim, AGREE: a new benchmark for the evaluation of distributional semantic models of ancient Greek, Digital Scholarship in the Humanities, Volume 39, Issue 1, April 2024, Pages 373–392, https://doi.org/10.1093/llc/fqad087

Related Organizations

University of Groningen
Netherlands

Keywords

word embeddings, evaluation, benchmark, human judgements, Ancient Greek, semantics

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	19
download	downloads	12

19
views
12
downloads
Powered by

Found an issue? Give us feedback

visibility

download

Average

Related to Research communities

Netherlands Research Portal

UArctic