Text-driven online action detection

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 19 Jan 2025Embargo end date: 01 Jan 2025 English Publisher:SAGE PublicationsJournal:Integrated Computer-Aided Engineering, volume 32, pages 415-423 (issn: 1069-2509, eissn: 1875-8835,

Copyright policy )

Authors: Manuel Benavent-Lledó; David Mulero-Pérez; David Ortiz-Perez; José García Rodríguez 0001;

doi: 10.1177/10692509241308069 , 10.48550/arxiv.2501.13518

arXiv: 2501.13518

handle: 10045/151445

Text-driven online action detection

- Summary
- Subjects
- Metrics

Abstract

Detecting actions as they occur is essential for applications like video surveillance, autonomous driving, and human-robot interaction. Known as online action detection, this task requires classifying actions in streaming videos, handling background noise, and coping with incomplete actions. Transformer architectures are the current state-of-the-art, yet the potential of recent advancements in computer vision, particularly vision-language models (VLMs), remains largely untapped for this problem, partly due to high computational costs. In this paper, we introduce TOAD: A Text-driven Online Action Detection architecture that supports zero-shot and few-shot learning. TOAD leverages CLIP (Contrastive Language-Image Pretraining) textual embeddings, enabling efficient use of VLMs without significant computational overhead. Our model achieves 82.46% mAP on the THUMOS14 dataset, outperforming existing methods, and sets new baselines for zero-shot and few-shot performance on the THUMOS14 and TVSeries datasets.

Related Organizations

University of Alicante
Spain

Keywords

Online action detection, FOS: Computer and information sciences, Vision transformer, Few-shot learning, Computer Vision and Pattern Recognition (cs.CV), Vision-language model, Computer Vision and Pattern Recognition

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green