Teaching Compositionality to CNNs

Preprint English OPEN
Stone, Austin ; Wang, Huayan ; Stark, Michael ; Liu, Yi ; Phoenix, D. Scott ; George, Dileep (2017)
  • Subject: Computer Science - Computer Vision and Pattern Recognition | Computer Science - Learning

Convolutional neural networks (CNNs) have shown great success in computer vision, approaching human-level performance when trained for specific tasks via application-specific loss functions. In this paper, we propose a method for augmenting and training CNNs so that their learned features are compositional. It encourages networks to form representations that disentangle objects from their surroundings and from each other, thereby promoting better generalization. Our method is agnostic to the specific details of the underlying CNN to which it is applied and can in principle be used with any CNN. As we show in our experiments, the learned representations lead to feature activations that are more localized and improve performance over non-compositional baselines in object recognition tasks.
  • References (49)
    49 references, page 1 of 5

    [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. ManeĀ“, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. VieĀ“gas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org. 4

    [2] P. Agrawal, J. Carreira, and J. Malik. Learning to see by moving. In ICCV, 2015. 1, 2

    [3] M. J. Choi, A. Torralba, and A. S. Willsky. Context models and out-of-context objects. Pattern Recognition Letters, 2012. 2

    [4] R. G. Cinbis and S. Sclarof. Contextual object detection using set-based classification. In ECCV, 2012. 2

    [5] S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, and M. Hebert. An empirical study of context in object detection. In CVPR, 2009. 2

    [6] S. M. A. Eslami, N. Heess, T. Weber, Y. Tassa, D. Szepesvari, K. Kavukcuoglu, and G. E. Hinton. Attend, infer, repeat: Fast scene understanding with generative models. In NIPS, 2016. 1

    [7] A. Farhadi and M. A. Sadeghi. Recognition using visual phrases. In CVPR, 2011. 2

    [8] S. Fidler and A. Leonardis. Towards scalable representations of object categories: Learning a hierarchy of parts. In CVPR, 2007. 2

    [9] C. Galleguillos and S. Belongie. Context based object categorization: A critical survey. CVIU, 2010. 2

    [10] R. Gao, D. Jayaraman, and K. Grauman. Object-centric representation learning from unlabeled videos. In ACCV, 2016. 1, 2

  • Metrics
    No metrics available
Share - Bookmark