
arXiv: 2002.07094
ABSTRACTDatasets for statistical analysis become extremely large even when stored on one single machine with some difficulty. Even when the data can be stored in one machine, the computational cost would still be intimidating. We propose a divide and conquer solution to density estimation using Bayesian mixture modelling, including the infinite mixture case. The methodology can be generalised to other application problems where a Bayesian mixture model is adopted. The proposed prior on each machine or subgroup modifies the original prior on both mixing probabilities and the rest of parameters in the distributions being mixed. The ultimate estimator is obtained by taking the average of the posterior samples corresponding to the proposed prior on each subset. Despite the tremendous reduction in time thanks to data splitting, the posterior contraction rate of the proposed estimator stays the same (up to a factor) as that using the original prior when the data is analysed as a whole. Simulation studies also justify the competency of the proposed method compared to the established WASP estimator in the finite‐dimension case. In addition, one of our simulations is performed in a shape‐constrained deconvolution context and reveals promising results. The application to a GWAS dataset reveals the advantage over a naive divide and conquer method that uses the original prior.
Bayesian density estimation, Methodology (stat.ME), FOS: Computer and information sciences, Bayesian problems; characterization of Bayes procedures, Bayesian inference, divide and conquer, Bayesian mixture model, posterior contraction rate, Statistics - Methodology
Bayesian density estimation, Methodology (stat.ME), FOS: Computer and information sciences, Bayesian problems; characterization of Bayes procedures, Bayesian inference, divide and conquer, Bayesian mixture model, posterior contraction rate, Statistics - Methodology
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
