Attias, H. (1999). Independent factor analysis. Neural Computation, 11(4), 803- 851.
Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford University Press.
Bishop, C. M., & Qazaz, C. S. (1995). Bayesian inference of noise levels in regression. In Proceedings ICANN 95 (pp. 59-64).
Bishop, C. M., Svensen, M., & Williams, C. K. I. (1998). GTM: The generative topographic mapping. Neural Computation, 10(1), 215-234.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B39(1), 1-38.
Hoel, P. G. (1984). Introduction to mathematical statistics. New York: Wiley.
Husmeier, D. (1998). Modelling conditional probability densities with neural networks. Unpublished doctoral dissertation, King's College London. Available online at: http://www.bioss.sari.ac.uk/»dirk/My publications.html.
Husmeier, D., & Taylor, J. G. (1997). Predicting conditional probability densities of stationary stochastic time series. Neural Networks, 10(3), 479-497.
Husmeier, D., & Taylor, J. G. (1998). Neural networks for predicting conditional probability densities: Improved training scheme combining EM and RVFL. Neural Networks, 11(1), 89-116.
Igelnik, B., & Pao, Y. H. (1995). Stochastic choice of basis functions on adaptive functional approximation and the functional-link net. IEEE Transactions on Neural Networks, 6, 1320-1329.