
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>Conversational systems should be able to generate diverse language forms to interact fluently and accurately with a variety of users. In this context, Natural Language Generation (NLG) plays a crucial role, directly influencing user perception. NLG engines convert Meaning Representations (MRs) into sentences. In dialogue systems, these MRs usually consist of dialogue acts (DAs), representing intentionalities or purposes, along with specific attributes and entities. In this work, our objective is to analyze whether providing additional information in the form of a task demonstrator—an MR-example pair—enhances the generation quality of a fine-tuned Large Language Model (LLM). The analysis involves five metrics evaluating different aspects and four datasets with distinct characteristics. To the best of our knowledge, this is the first in-depth study on NLG quality in the context of dialogue, implementing a comparative analysis of the cross-impact of meaning representations, domain, tasks, and corpora characteristics, as well as the metrics used to evaluate experimental results. This study shows that enriched inputs are effective in small datasets for complex tasks with high input and output variability. They are also beneficial in zero-shot settings for any domain and task. Next, the thorough analysis of the metrics reveals that metrics that evaluate semantic aspects are more adequate to evaluate generation qualitythan metrics that evaluate lexical aspects based on n-gram overlapping. In addition, semantic metrics trained with human ratings can detect omissions and semantic nuances not appreciable by other semantic metrics based on sentence embeddings. Finally, the generative models present fast adaptability to different tasks and robustness at semantic and communicative intention levels.
Natural Language Generation, Prompt-based Learning, Dialogue Acts, NLG Evaluation, Dialogue Systems
Natural Language Generation, Prompt-based Learning, Dialogue Acts, NLG Evaluation, Dialogue Systems
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
