SUTrack: Towards Simple and Unified Single Object Tracking

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 11 Apr 2025Embargo end date: 01 Jan 2024Publisher:Association for the Advancement of Artificial Intelligence (AAAI)Journal:Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 2,239-2,247 (issn: 2159-5399, eissn: 2374-3468,

Copyright policy )

Authors: Xin Chen 0032; Ben Kang; Wanting Geng; Jiawen Zhu 0003; Yi Liu; Dong Wang 0004; Huchuan Lu;

doi: 10.1609/aaai.v39i2.32223 , 10.48550/arxiv.2412.19138

arXiv: 2412.19138

SUTrack: Towards Simple and Unified Single Object Tracking

- Summary
- Subjects
- Metrics

Abstract

In this paper, we propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack. It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a single session. Due to the distinct nature of the data, current methods typically design individual architectures and train separate models for each task. This fragmentation results in redundant training processes, repetitive technological innovations, and limited cross-modal knowledge sharing. In contrast, SUTrack demonstrates that a single model with a unified input representation can effectively handle various SOT tasks, eliminating the need for task-specific designs and separate training sessions. Additionally, we introduce a task-recognition training strategy and a soft token type embedding to further enhance SUTrack's performance with minimal overhead. Experiments show that SUTrack outperforms previous task-specific counterparts across 11 datasets spanning five SOT tasks. Moreover, we provide a range of models catering edge devices as well as high-performance GPUs, striking a good trade-off between speed and accuracy. We hope SUTrack could serve as a strong foundation for further compelling research into unified tracking models.

Related Organizations

Dalian Polytechnic University
China (People's Republic of)
Dalian University of Technology
Dalian University of Technology
Baidu (China)
China (People's Republic of)
Dalian University of Technology
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	31
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%

Found an issue? Give us feedback

31

Top 10%

Top 1%

Green