
We study adaptive (or online) nonlinear regression with Long-Short-Term-Memory (LSTM) based networks, i.e., LSTM-based adaptive learning. In this context, we introduce an efficient Extended Kalman filter (EKF) based second-order training algorithm. Our algorithm is truly online, i.e., it does not assume any underlying data generating process and future information, except that the target sequence is bounded. Through an extensive set of experiments, we demonstrate significant performance gains achieved by our algorithm with respect to the state-of-the-art methods. Here, we mainly show that our algorithm consistently provides 10 to 45\% improvement in the accuracy compared to the widely-used adaptive methods Adam, RMSprop, and DEKF, and comparable performance to EKF with a 10 to 15 times reduction in the run-time.
Signal Processing (eess.SP), FOS: Computer and information sciences, Computer Science - Machine Learning, MLR-DEEP, Adaptive learning, Machine Learning (stat.ML), Regression, Machine Learning (cs.LG), Long short term memory (LSTM), Online learning, Statistics - Machine Learning, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Signal Processing, Truly online, MLR-SLER [Stochastic gradient descent (SGD) EDICS Category], Kalman filtering
Signal Processing (eess.SP), FOS: Computer and information sciences, Computer Science - Machine Learning, MLR-DEEP, Adaptive learning, Machine Learning (stat.ML), Regression, Machine Learning (cs.LG), Long short term memory (LSTM), Online learning, Statistics - Machine Learning, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Signal Processing, Truly online, MLR-SLER [Stochastic gradient descent (SGD) EDICS Category], Kalman filtering
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 8 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
