The Bayesian evidence scheme for regularizing probability-density estimating neural networks

Article English OPEN
Husmeier, D. (2000)

Training probability-density estimating neural networks with the expectation-maximization (EM) algorithm aims to maximize the likelihood of the training set and therefore leads to overfitting for sparse data. In this article, a regularization method for mixture models with generalized linear kernel centers is proposed, which adopts the Bayesian evidence approach and optimizes the hyperparameters of the prior by type II maximum likelihood. This includes a marginalization over the parameters, which is done by Laplace approximation and requires the derivation of the Hessian of the log-likelihood function. The incorporation of this approach into the standard training scheme leads to a modified form of the EM algorithm, which includes a regularization term and adapts the hyperparameters on-line after each EM cycle. The article presents applications of this scheme to classification problems, the prediction of stochastic time series, and latent space models.
  • References (25)
    25 references, page 1 of 3

    Attias, H. (1999). Independent factor analysis. Neural Computation, 11(4), 803- 851.

    Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford University Press.

    Bishop, C. M., & Qazaz, C. S. (1995). Bayesian inference of noise levels in regression. In Proceedings ICANN 95 (pp. 59-64).

    Bishop, C. M., Svensen, M., & Williams, C. K. I. (1998). GTM: The generative topographic mapping. Neural Computation, 10(1), 215-234.

    Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B39(1), 1-38.

    Hoel, P. G. (1984). Introduction to mathematical statistics. New York: Wiley.

    Husmeier, D. (1998). Modelling conditional probability densities with neural networks. Unpublished doctoral dissertation, King's College London. Available online at:┬╗dirk/My publications.html.

    Husmeier, D., & Taylor, J. G. (1997). Predicting conditional probability densities of stationary stochastic time series. Neural Networks, 10(3), 479-497.

    Husmeier, D., & Taylor, J. G. (1998). Neural networks for predicting conditional probability densities: Improved training scheme combining EM and RVFL. Neural Networks, 11(1), 89-116.

    Igelnik, B., & Pao, Y. H. (1995). Stochastic choice of basis functions on adaptive functional approximation and the functional-link net. IEEE Transactions on Neural Networks, 6, 1320-1329.

  • Metrics
    No metrics available
Share - Bookmark