<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
pmid: 33505103
pmc: PMC7817575
AbstractIn this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.
FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial intelligence, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Conversational, Computer Science - Human-Computer Interaction, Deep learning, [INFO] Computer Science [cs], 006: Spezielle Computerverfahren, Article, Human-Computer Interaction (cs.HC), Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Evaluation metrics, Chatbots, Dialogue systems, [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL], Discourse model, Evaluation, Computation and Language (cs.CL)
FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial intelligence, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Conversational, Computer Science - Human-Computer Interaction, Deep learning, [INFO] Computer Science [cs], 006: Spezielle Computerverfahren, Article, Human-Computer Interaction (cs.HC), Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Evaluation metrics, Chatbots, Dialogue systems, [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL], Discourse model, Evaluation, Computation and Language (cs.CL)
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 163 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 1% | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |