
in illumination, occlusions, pose variations, and even the absence of visual query. In this thesis,
Cross-modal person re-identification (Re-ID) is a crucial component of a modern video surveillance
The first contribution proposes to jointly model the multi-modal latent space, where corresponding
Unified feature learning effectively utilizes textual data as a super-annotation signal for visual
has further solidified the idea of a unified backbone model. In the final contribution, we propose
backbone to implicitly align the shared semantic concepts from the start of the learning network.
a vision transformer architecture design with the aim of an effective intra-modal and cross-modal
late-fusion models depends on the quality of the feature extraction backbones for each modality.
system and security infrastructure. The task of matching people across multiple nonoverlapping
key challenge is to align cross-modality feature representations according to the fine-grained
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
