Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ IEEE Accessarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article . 2025 . Peer-reviewed
License: CC BY NC ND
Data sources: Crossref
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article . 2025
Data sources: DOAJ
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

CFTformer: End-to-End Cross-Frame Multi-Object Tracking With Transformer

Authors: Abdollah Amirkhani; Seyed Alireza Khoshnevis;

CFTformer: End-to-End Cross-Frame Multi-Object Tracking With Transformer

Abstract

Multi-object tracking (MOT) has been at the center of numerous applications from autonomous vehicles (AVs) to surveillance and even retail analytics. Traditional MOT methods typically rely on motion-based and appearance-based similarity information to associate detections across frames. However, the new transformer attention-based approach to MOT has removed the need for complex post-processing steps, such as graph optimization, allowing for end-to-end query tracking across frames. While the new transformer-based approaches offer many advantages, in the majority of these models the temporal dimension of the sequence is only considered in either the iterative processing of the frames or the memory of the queries. The proposed cross-frame multi-object tracking transformer (CFTforrmer) aims to improve one of the challenging areas of tracking, the association across different frames in the temporal dimension. In the proposed approach, the temporal identities of the frames are included in the positional encoding of the patches. This approach allows the encoder-decoder to track the queries more efficiently across the frames. For this model, scalable deformable-attention layers were used to design the encoder and decoder to decrease the computational cost. CFTformer also employs the proposed attention-based trajectory refinement (ATR) scheme to improve the tracking performance in blurred frames. The three-dimensional positional encoding of the patches helps the proposed ATR module to better capture the trajectories of the queries and generate smoother predictions. Overall, the model was able to achieve 1.7% and 0.6% improvement in the identification F1 score (IDF1) metric on MOT17 and MOT20 datasets while having ~0-15% lower number of identity switches, compared to other transformer-based approaches. More accurate tracking and lower identity switches make this algorithm more suitable to be used in the field of autonomous driving. To access model’s performance in AV applications, the BDD100K dataset was utilized for training and evaluation where the proposed approach achieved a 1.9% improvement in the IDF1 compared to other transformer-based models.

Related Organizations
Keywords

positional encoding, trajectory refinement, CFTformer, transformer network, Electrical engineering. Electronics. Nuclear engineering, multi-object tracking (MOT), TK1-9971

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
gold