Leveraging Pronoun Disambiguation in Multimodal Interaction for Contextual Understanding of Voice Assistant Queries

Name: Leveraging Pronoun Disambiguation in Multimodal Interaction for Contextual Understanding of Voice Assistant Queries
Keywords: Voice Assistant, Vision-Language Model, Mixed Reality, Multimodal Interaction

Thibaut Septon; Théo Leclercq; Bruno Dumas

Found an issue? Give us feedback

Pure@Namurarrow_drop_down

Pure@Namur

Conference object . 2025

License: unspecified

Data sources: Pure@Namur

https://doi.org/10.1145/370855...

Article . 2025 . Peer-reviewed

License: https://www.acm.org/publications/policies/copyright_policy#Background

Data sources: Crossref

Repository of the University of Namur

Contribution for newspaper or weekly magazine . 2025

Data sources: Repository of the University of Namur

DBLP

Conference object

Data sources: DBLP

Leveraging Pronoun Disambiguation in Multimodal Interaction for Contextual Understanding of Voice Assistant Queries

descriptionPublicationkeyboard_double_arrow_right Article , Conference object , Contribution for newspaper or weekly magazine 24 Mar 2025 Belgium Publisher:ACMJournal:Companion Proceedings of the 30th International Conference on Intelligent User Interfaces

Authors: Thibaut Septon; Théo Leclercq; Bruno Dumas;

doi: 10.1145/3708557.3716362

Leveraging Pronoun Disambiguation in Multimodal Interaction for Contextual Understanding of Voice Assistant Queries

- Summary
- Subjects
- Metrics

Abstract

Voice Assistants (VAs) are becoming an increasingly important part of our lives. However, most widespread VAs generally fail to take into account the user’s spatiotemporal context [11], leading to more descriptive and less natural dialogue. This paper introduces VOICE, an open-source multimodal VA leveraging multimodal interaction and vision-language models to allow for a more flexible and natural communication. Additionally, we present a preliminary user study to evaluate VOICE’s ability to understand queries with contextual references.

Country

Belgium

Related Organizations

Université de Namur
Belgium
University of Namur
Belgium

Keywords

Voice Assistant, Vision-Language Model, Mixed Reality, Multimodal Interaction

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average