We provide concept detection scores for the IACC.3 dataset (600 hr internet archive videos), which is used in the TRECVID Ad-hoc Video Search (AVS) task [1]. Concept detection scores for 1345 concepts (1000 ImageNet concepts provided for the ILSVRC challenge [2] and 345 TRECVID SIN concepts [3]) have been generated as follows: 1) To generate scores for the ImageNet concepts, 5 pre-trained ImageNet networks were applied on the IACC.3 dataset and their output was fused in terms of arithmetic mean. 2) To generate scores for the TRECVID SIN concepts, two pre-trained ImageNet networks were fine-tuned on these concepts using a combination of our methods presented in the following papers: [4], [5]. We provide two different sets of concept scores for the TRECVID SIN concepts: a) The output of the two fine-tuned networks was fused in terms of arithmetic mean in order to return a single score for each concept. b) The last fully-connected layer was used as feature to train SVM classifiers separately for each fine-tuned network and each concept. Then, the SVM classifiers were applied on the IACC.3 dataset and the prediction scores of the SVMs for the same concept were fused in terms of arithmetic mean in order to return a single score for each concept. We evaluated the two different sets of concepts in terms of MXInfAP on a subset of 38 TRECVID SIN concepts for which ground-truth annotation exists, and the MXInfAP of each set of concept scores is: a) 30.04% for the networks' direct output, b) 35.81% for the SVM classifiers. Three different files of concept detection scores can be downloaded (after unpacking the compressed file): 1) scores_ImageNet.txt 2a) scores_SIN_direct.txt 2b) scores_SIN_svm.txt In total there are 335944 rows in each file; 1002 columns in the first file and 347 columns in each of the other two. Each row in any of these files corresponds to a different video shot; the video shot IDs appear in the first two columns. (Note: the shot IDs are the ones from the mp7 files in the TRECVID AVS master shot reference, with the format shotFILENUMBER_SHOTNUMBER). Then, each column (except for the fist two) corresponds to a different concept, with all concept scores being in [0,1] range. The higher the score the more likely that the corresponding concept appears in the video shot. Files “concept_names_ImageNet.txt” and “concept_names_SIN.txt” indicate the order of the concepts that is used in the concept score files. [1] G. Awad, J. Fiscus, M. Michel et al. 2016. TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking. In TRECVID 2016 Workshop. NIST, USA. [2] O. Russakovsky, J. Deng, H. Su et al. 2015. ImageNet Large Scale Visual Recognition Challenge. Int. Journal of Computer Vision (IJCV) 115, 211–252. [3] G. Awad, C. Snoek, A. Smeaton, and G. Quénot. 2016. TRECVid semantic indexing of video: a 6-year retrospective. ITE Transactions on Media Technology and Applications, 4 (3). pp. 187-208. [4] N. Pittaras, F. Markatopoulou, V. Mezaris, I. Patras. 2017. Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks, Proc. 23rd Int. Conf. on MultiMedia Modeling (MMM'17), Reykjavik, Iceland, Springer LNCS vol. 10132, pp. 102-114, Jan. 2017. [5] F. Markatopoulou, V. Mezaris, and I. Patras. 2016. Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection, Proc. ACM Multimedia 2016, Amsterdam, Oct. 2016.

Linked publications: (1) N. Pittaras, F. Markatopoulou, V. Mezaris, I. Patras, "Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks", Proc. 23rd Int. Conf. on MultiMedia Modeling (MMM'17), Reykjavik, Iceland, Jan. 2017 (2) F. Markatopoulou, V. Mezaris, I. Patras, "Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection", Proc. ACM Multimedia 2016, Amsterdam, The Netherlands, Oct. 2016