On the importance of sluggish state memory for learning long term dependency

Article English OPEN
Tepper, JA ; Shertil, MS ; Powell, HM (2016)

The vanishing gradients problem inherent in Simple Recurrent Networks (SRN) trained with back-propagation, has led to a significant shift towards the use of Long Short-term Memory (LSTM) and Echo State Networks (ESN), which overcome this problem through either second order error-carousel schemes or different learning algorithms respectively. This paper re-opens the case for SRN-based approaches, by considering a variant, the Multi-recurrent Network (MRN). We show that memory units embedded within its architecture can ameliorate against the vanishing gradient problem, by providing variable sensitivity to recent and more historic information through layer- and self-recurrent links with varied weights, to form a so-called sluggish state-based memory. We demonstrate that an MRN, optimised with noise injection, is able to learn the long term dependency within a complex grammar induction task, significantly outperforming the SRN, NARX and ESN. Analysis of the internal representations of the networks, reveals that sluggish state-based representations of the MRN are best able to latch on to critical temporal dependencies spanning variable time delays, to maintain distinct and stable representations of all underlying grammar states. Surprisingly, the ESN was unable to fully learn the dependency problem, suggesting the major shift towards this class of models may be premature.
  • References (50)
    50 references, page 1 of 5

    [1] [2] [3] [4] [5] [6] [7] Elman, J. (1991). "Distributed representations, simple recurrent networks, and grammatical structure". Machine Learning, vol 7, 195-224.

    D. Palmer-Brown, J. A. Tepper, and H. M. Powell (2002). “Connectionist natural language parsing,” Trends Cogn. Sci., vol. 6, no. 10, pp. 437-442, 2002.

    M. H. Christiansen and N. Chater (2001). “Connectionist psycholinguistics in perspective,” in Christiansen, MH and Chater, N, (eds.) Connectionist psycholinguistics. pp. 19-75. Ablex: Westport, CT.

    T. Koskela, M. Varsta, J. Heikkonen, and K. Kaski (1998). “Temporal sequence processing using recurrent SOM,” in Knowledge-Based Intelligent Electronic Systems, 1998. Proceedings KES'98. 1998 Second International Conference on, 1998, vol. 1, pp. 290-297.

    J. M. Binner, P. Tino, J. Tepper, R. Anderson, B. Jones, and G. Kendall (2010). “Does money matter in inflation forecasting?,” Phys. A Stat. Mech. its Appl., vol. 389, no. 21, pp. 4793-4808, 2010.

    J. F. Kolen and S. C. Kremer (2001). A field guide to dynamical recurrent networks.

    John Wiley & Sons, 2001.

    I. Sutskever and G. Hinton (2010). “Temporal-kernel recurrent neural networks,” Neural Networks, vol. 23, no. 2, pp. 239-243, 2010.

    [9] [10] B. Cartling (2008). “On the implicit acquisition of a context-free grammar by a simple recurrent neural network,” Neurocomputing, vol. 71, no. 7-9, pp. 1527-1537, Mar.

    J. L. Elman (1995). “Language as a dynamical system,” Mind as motion Explor. Dyn.

  • Related Research Results (1)
  • Similar Research Results (1)
  • Metrics
    views in OpenAIRE
    views in local repository
    downloads in local repository

    The information is available from the following content providers:

    From Number Of Views Number Of Downloads
    Institutional Repository - IRUS-UK 0 42
Share - Bookmark