F.-Y. Wang, H. Zhang, and D. Liu, “Adaptive dynamic programming: An introduction,” IEEE Computational Intelligence Magazine, vol. 4, no. 2, pp. 39-47, 2009.
 R. E. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton University Press, 1957.
 P. J. Werbos, “Approximating dynamic programming for real-time control and neural modeling.” in Handbook of Intelligent Control, White and Sofge, Eds. New York: Van Nostrand Reinhold, 1992, ch. 13, pp. 493-525.
 D. Prokhorov and D. Wunsch, “Adaptive critic designs,” IEEE Transactions on Neural Networks, vol. 8, no. 5, pp. 997-1007, 1997.
 S. Ferrari and R. F. Stengel, “Model-based adaptive critic designs,” in Handbook of learning and approximate dynamic programming, J. Si, A. Barto, W. Powell, and D. Wunsch, Eds. New York: Wiley-IEEE Press, 2004, pp. 65-96.
 M. Fairbank, E. Alonso, and D. Prokhorov, “Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 10, pp. 1671-1678, October 2012.
 M. Fairbank, “Reinforcement learning by value gradients,” CoRR, vol. abs/0803.3539, 2008. [Online]. Available: http://arxiv.org/abs/0803.3539 [OpenAIRE]
 M. Fairbank and E. Alonso, “Value-gradient learning,” in Proceedings of the IEEE International Joint Conference on Neural Networks 2012 (IJCNN'12). IEEE Press, June 2012, pp. 3062-3069.
 R. S. Sutton, “Learning to predict by the methods of temporal differences,” Machine Learning, vol. 3, pp. 9-44, 1988.
 G. K. Venayagamoorthy and D. C. Wunsch, “Dual heuristic programming excitation neurocontrol for generators in a multimachine power system,” IEEE Transactions on Industry Applications, vol. 39, pp. 382- 394, 2003.
 G. G. Lendaris and C. Paintz, “Training strategies for critic and action neural networks in dual heuristic programming method,” in Proceedings of International Conference on Neural Networks, Houston, 1997.
 L. S. Pontryagin, V. G. Boltayanskii, R. V. Gamkrelidze, and E. F. Mishchenko, The Mathematical Theory of Optimal Processes (Translated from Russian). Wiley, 1962, vol. 4.
 M. Fairbank and E. Alonso, “The local optimality of reinforcement learning by value gradients, and its relationship to policy gradient learning,” CoRR, vol. abs/1101.0428, 2011. [Online]. Available: http://arxiv.org/abs/1101.0428 [OpenAIRE]
 --, “A comparison of learning speed and ability to cope without exploration between DHP and TD(0),” in Proceedings of the IEEE International Joint Conference on Neural Networks 2012 (IJCNN'12). IEEE Press, June 2012, pp. 1478-1485.
 P. J. Werbos, T. McAvoy, and T. Su, “Neural networks, system identification, and control in the chemical process industries.” in Handbook of Intelligent Control, White and Sofge, Eds. New York: Van Nostrand Reinhold, 1992, ch. 10, pp. 283-356.