
Cleaning covariance matrices is a highly non-trivial problem, yet of central importance in the statistical inference of dependence between objects. We propose here a probabilistic hierarchical clustering method, named Bootstrapped Average Hierarchical Clustering (BAHC), that is particularly effective in the high-dimensional case, i.e., when there are more objects than features. When applied to DNA microarray, our method yields distinct hierarchical structures that cannot be accounted for by usual hierarchical clustering. We then use global minimum-variance risk management to test our method and find that BAHC leads to significantly smaller realized risk compared to state-of-the-art linear and nonlinear filtering methods in the high-dimensional case. Spectral decomposition shows that BAHC better captures the persistence of the dependence structure between asset price returns in the calibration and the test periods.
FOS: Computer and information sciences, Statistical Finance (q-fin.ST), Science, Gene Expression Profiling, Q, R, Quantitative Finance - Statistical Finance, Methodology (stat.ME), FOS: Economics and business, Risk Management (q-fin.RM), Medicine, Cluster Analysis, Computer Simulation, Statistics - Methodology, Algorithms, Quantitative Finance - Risk Management, Research Article, Oligonucleotide Array Sequence Analysis
FOS: Computer and information sciences, Statistical Finance (q-fin.ST), Science, Gene Expression Profiling, Q, R, Quantitative Finance - Statistical Finance, Methodology (stat.ME), FOS: Economics and business, Risk Management (q-fin.RM), Medicine, Cluster Analysis, Computer Simulation, Statistics - Methodology, Algorithms, Quantitative Finance - Risk Management, Research Article, Oligonucleotide Array Sequence Analysis
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 10 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
