
This report presents the overview of the runs related to Ad-hoc Video Search (AVS) and Activities in Extended Video (ActEV) tasks on behalf of the ITI-CERTH team. Our participation in the AVS task involves a collection of five cross-modal deep network architectures and numerous pretrained models, which are used to calculate the similarities between video shots and queries. These calculated similarities serve as input to a trainable neural network that effectively combines them. During the retrieval stage, we also introduce a normalization step that utilizes both the current and previous AVS queries for revising the combined video shot-query similarities. For the ActEV task, we adapt our framework to support a rule-based classification to overcome the challenges of detecting and recognizing activities in a multi-label manner while experimenting with two separate activity classifiers.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
