Downloads provided by UsageCounts
This paper presents GAVT, a highly accurate audiovisual 3D tracking system based on particle filters and a probabilistic framework, employing a single camera and a microphone array. Our first contribution is a complex visual appearance model that accurately locates the speaker’s mouth. It transforms a Viola & Jones face detector classifier kernel into a likelihood estimator, leveraging knowledge from multiple classifiers trained for different face poses. Additionally, we propose a mechanism to handle occlusions based on the new likelihood’s dispersion. The audio localization proposal utilizes a probabilistic steered response power, representing cross-correlation functions as Gaussian mixture models. Moreover, to prevent tracker interference, we introduce a novel mechanism for associating Gaussians with speakers. The evaluation is carried out using the AV16.3 and CAV3D databases for Single- and Multiple-Object Tracking tasks (SOT and MOT, respectively). GAVT significantly improves the localization performance over audio-only and video-only modalities, with up to 50.3% average relative improvement in 3D when compared with the video-only modality. When compared to the state of the art, our audiovisual system achieves up to 69.7% average relative improvement for the SOT and MOT tasks in the AV16.3 dataset (2D comparison), and up to 18.1% average relative improvement in the MOT task for the CAV3D dataset (3D comparison).
particle filter, Chemical technology, Audiovisual tracking, Probabilistic SRP-PHAT, TP1-1185, probabilistic <tt>SRP-PHAT</tt>, audiovisual tracking, Article, speaker localization, Smart spaces, Speaker localization, multi-pose face observation model, Particle filter, Electrónica, Electronics, Multi-pose face observation model, smart spaces
particle filter, Chemical technology, Audiovisual tracking, Probabilistic SRP-PHAT, TP1-1185, probabilistic <tt>SRP-PHAT</tt>, audiovisual tracking, Article, speaker localization, Smart spaces, Speaker localization, multi-pose face observation model, Particle filter, Electrónica, Electronics, Multi-pose face observation model, smart spaces
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 63 | |
| downloads | 10 |

Views provided by UsageCounts
Downloads provided by UsageCounts