Transformer-Based Motion Predictor for Multi-Dancer Tracking in Non-Linear Movements of Dancesport Performance

Name: Transformer-Based Motion Predictor for Multi-Dancer Tracking in Non-Linear Movements of Dancesport Performance
Creator: Zhiling Wang
Keywords: DanceSports, tracking-by-detection, Deep learning, occlusion, Electrical engineering. Electronics. Nuclear engineering, vision transformer, multiple object tracking, TK1-9971

Zhiling Wang

Found an issue? Give us feedback

IEEE Accessarrow_drop_down

IEEE Access

Article . 2025 . Peer-reviewed

License: CC BY

Data sources: Crossref

IEEE Access

Article . 2025

Data sources: DOAJ

DBLP

Article

Data sources: DBLP

Transformer-Based Motion Predictor for Multi-Dancer Tracking in Non-Linear Movements of Dancesport Performance

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2025Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Access, volume 13, pages 100,647-100,666 (eissn: 2169-3536,

Copyright policy )

Authors: Zhiling Wang;

doi: 10.1109/access.2025.3577797

Transformer-Based Motion Predictor for Multi-Dancer Tracking in Non-Linear Movements of Dancesport Performance

- Summary
- Subjects
- Metrics

Abstract

Automated multi-dancer tracking is a critical yet challenging task in Dance Quality Assessment (DanceQA), requiring precise motion estimation to evaluate synchronization, formation transitions, and rhythmic accuracy. Traditional Multi-Object Tracking (MOT) frameworks predominantly rely on appearance-based features and Kalman Filter-based motion models, which struggle with complex, non-linear motion patterns exhibited in dance performances. These conventional approaches often suffer from identity fragmentation, occlusion-related failures, and inaccurate motion predictions due to their inherent assumption of constant velocity. Although recent deep learning-based trackers incorporating recurrent architectures and transformers have improved motion modeling, they still lack adaptability to highly dynamic motion variations and remain heavily reliant on large-scale training datasets. To bridge this gap, we propose the Multi-Dancer Spatio-Temporal Tracker (MDSTT), a novel transformer-based framework that exclusively leverages historical motion cues for robust and identity-consistent tracking. Unlike conventional tracking methods that integrate appearance features, MDSTT processes historical bounding box trajectories through a transformer encoder, capturing both long-range and short-term spatio-temporal dependencies while mitigating occlusion-induced identity switches. The proposed framework introduces a Historical Trajectory Embedding module to enhance motion-based representation learning, an Adaptable Motion Predictor with a learnable prediction token for improved trajectory continuity, and a refined Hungarian Matching strategy incorporating Intersection-over-Union (IoU), motion direction difference, and L1 distance to optimize object association. Additionally, probabilistic masked token augmentation is incorporated to simulate real-world occlusion scenarios, improving resilience against missing detections. Extensive evaluations on the DanceTrack dataset demonstrate that MDSTT achieves state-of-the-art (SoTA) tracking performance, surpassing existing methods with a 22.3% improvement in HOTA (77.4 vs. 63.3), 7.6% higher detection accuracy (86.4 vs. 80.3), and 26.6% better identity association accuracy (63.4 vs. 50.1) compared to SoTA transformer-based MOT models.

Related Organizations

Shaanxi University of Technology
China (People's Republic of)

Keywords

DanceSports, tracking-by-detection, Deep learning, occlusion, Electrical engineering. Electronics. Nuclear engineering, vision transformer, multiple object tracking, TK1-9971

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold