publication . Article . Preprint . 2020

Compositionality Decomposed: How do Neural Networks Generalise?

Dieuwke Hupkes; Verna Dankers; Mathijs Mul; Elia Bruni;
Open Access
  • Published: 12 Apr 2020 Journal: Journal of Artificial Intelligence Research, volume 67, pages 757-795 (eissn: 1076-9757, Copyright policy)
  • Publisher: AI Access Foundation
Abstract
Despite a multitude of empirical studies, little consensus exists on whether neural networks are able to generalise compositionally, a controversy that, in part, stems from a lack of agreement about what it means for a neural model to be compositional. As a response to this controversy, we present a set of tests that provide a bridge between, on the one hand, the vast amount of linguistic and philosophical theory about compositionality of language and, on the other, the successful neural models of language. We collect different interpretations of compositionality and translate them into five theoretically grounded tests for models that are formulated on a task-independent level. In particular, we provide tests to investigate (i) if models systematically recombine known parts and rules (ii) if models can extend their predictions beyond the length they have seen in the training data (iii) if models’ composition operations are local or global (iv) if models’ predictions are robust to synonym substitutions and (v) if models favour rules or exceptions during training. To demonstrate the usefulness of this evaluation paradigm, we instantiate these five tests on a highly compositional data set which we dub PCFG SET and apply the resulting tests to three popular sequence-to-sequence models: a recurrent, a convolution-based and a transformer model. We provide an in-depth analysis of the results, which uncover the strengths and weaknesses of these three architectures and point to potential areas of improvement.
Persistent Identifiers
Subjects
free text keywords: Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning, Artificial neural network, Empirical research, Philosophical theory, Transformer (machine learning model), Computer science, Set (abstract data type), Theoretical computer science, Synonym (database), Principle of compositionality, Convolution (computer science)
Funded by
EC| MAGIC
Project
MAGIC
Multimodal Agents Grounded via Interactive Communication
  • Funder: European Commission (EC)
  • Project Code: 790369
  • Funding stream: H2020 | MSCA-IF-EF-ST
Validated by funder
,
NWO| Language in Interaction
Project
  • Funder: Netherlands Organisation for Scientific Research (NWO) (NWO)
  • Project Code: 2300176475
29 references, page 1 of 2

Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., and Bengio, Y. (2015). Attention-based models for speech recognition. In Advances in neural information processing systems, pages 577{ 585. [OpenAIRE]

Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555.

Clark, S. (2015). Vector space models of lexical meaning. The Handbook of Contemporary semantic theory, pages 493{522.

Denil, M., Demiraj, A., Kalchbrenner, N., Blunsom, P., and de Freitas, N. (2014). Modelling, visualising and summarising documents with a single convolutional neural network. CoRR, abs/1406.3830. [OpenAIRE]

Dess , R. and Baroni, M. (2019). CNNs found to jump around more skillfully than rnns: Compositional generalization in seq2seq convolutional networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Short Papers, pages 3919{3923.

Erk, K. (2012). Vector space models of word meaning and phrase meaning: A survey. Language and Linguistics Compass, 6(10):635{653. [OpenAIRE]

Fodor, J. A. and Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2):3{71. [OpenAIRE]

Gehring, J., Auli, M., Grangier, D., and Dauphin, Y. N. (2017a). A convolutional encoder model for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), Long Papers, volume 1, pages 123{135.

Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y. N. (2017b). Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning, (ICML), pages 1243{1252.

Goldberg, Y. (2019). Assessing BERT's syntactic abilities. CoRR, abs/1901.05287.

Goller, C. and Kuchler, A. (1996). Learning task-dependent distributed representations by backpropagation through structure. In Proceedings of International Conference on Neural Networks (ICNN'96), volume 1, pages 347{352. IEEE.

Gulordava, K., Bojanowski, P., Grave, E., Linzen, T., and Baroni, M. (2018). Colorless green recurrent networks dream hierarchically. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), volume 1, pages 1195{1205, New Orleans, LA. [OpenAIRE]

He, X. and Golub, D. (2016). Character-level question answering with attention. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1598{1607.

Hirschberg, J. and Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245):261{266. [OpenAIRE]

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735{ 1780.

29 references, page 1 of 2
Abstract
Despite a multitude of empirical studies, little consensus exists on whether neural networks are able to generalise compositionally, a controversy that, in part, stems from a lack of agreement about what it means for a neural model to be compositional. As a response to this controversy, we present a set of tests that provide a bridge between, on the one hand, the vast amount of linguistic and philosophical theory about compositionality of language and, on the other, the successful neural models of language. We collect different interpretations of compositionality and translate them into five theoretically grounded tests for models that are formulated on a task-independent level. In particular, we provide tests to investigate (i) if models systematically recombine known parts and rules (ii) if models can extend their predictions beyond the length they have seen in the training data (iii) if models’ composition operations are local or global (iv) if models’ predictions are robust to synonym substitutions and (v) if models favour rules or exceptions during training. To demonstrate the usefulness of this evaluation paradigm, we instantiate these five tests on a highly compositional data set which we dub PCFG SET and apply the resulting tests to three popular sequence-to-sequence models: a recurrent, a convolution-based and a transformer model. We provide an in-depth analysis of the results, which uncover the strengths and weaknesses of these three architectures and point to potential areas of improvement.
Persistent Identifiers
Subjects
free text keywords: Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning, Artificial neural network, Empirical research, Philosophical theory, Transformer (machine learning model), Computer science, Set (abstract data type), Theoretical computer science, Synonym (database), Principle of compositionality, Convolution (computer science)
Funded by
EC| MAGIC
Project
MAGIC
Multimodal Agents Grounded via Interactive Communication
  • Funder: European Commission (EC)
  • Project Code: 790369
  • Funding stream: H2020 | MSCA-IF-EF-ST
Validated by funder
,
NWO| Language in Interaction
Project
  • Funder: Netherlands Organisation for Scientific Research (NWO) (NWO)
  • Project Code: 2300176475
29 references, page 1 of 2

Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., and Bengio, Y. (2015). Attention-based models for speech recognition. In Advances in neural information processing systems, pages 577{ 585. [OpenAIRE]

Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555.

Clark, S. (2015). Vector space models of lexical meaning. The Handbook of Contemporary semantic theory, pages 493{522.

Denil, M., Demiraj, A., Kalchbrenner, N., Blunsom, P., and de Freitas, N. (2014). Modelling, visualising and summarising documents with a single convolutional neural network. CoRR, abs/1406.3830. [OpenAIRE]

Dess , R. and Baroni, M. (2019). CNNs found to jump around more skillfully than rnns: Compositional generalization in seq2seq convolutional networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Short Papers, pages 3919{3923.

Erk, K. (2012). Vector space models of word meaning and phrase meaning: A survey. Language and Linguistics Compass, 6(10):635{653. [OpenAIRE]

Fodor, J. A. and Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2):3{71. [OpenAIRE]

Gehring, J., Auli, M., Grangier, D., and Dauphin, Y. N. (2017a). A convolutional encoder model for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), Long Papers, volume 1, pages 123{135.

Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y. N. (2017b). Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning, (ICML), pages 1243{1252.

Goldberg, Y. (2019). Assessing BERT's syntactic abilities. CoRR, abs/1901.05287.

Goller, C. and Kuchler, A. (1996). Learning task-dependent distributed representations by backpropagation through structure. In Proceedings of International Conference on Neural Networks (ICNN'96), volume 1, pages 347{352. IEEE.

Gulordava, K., Bojanowski, P., Grave, E., Linzen, T., and Baroni, M. (2018). Colorless green recurrent networks dream hierarchically. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), volume 1, pages 1195{1205, New Orleans, LA. [OpenAIRE]

He, X. and Golub, D. (2016). Character-level question answering with attention. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1598{1607.

Hirschberg, J. and Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245):261{266. [OpenAIRE]

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735{ 1780.

29 references, page 1 of 2
Any information missing or wrong?Report an Issue