
An important, recurring problem in statistics involves the determination of strata boundaries for use in stratified sampling. This paper describes a practical method for stratifying a population of observations based on optimal cluster analysis. The goal of stratification is constructing a partition such that observations within a stratum are homogeneous as defined by within-cluster variances for attributes that are deemed important, while observations between strata are heterogeneous. The problem is defined as a deterministic optimization model with integer variables and is solved by means of a subgradient method. Computational tests with several examples show that the within-strata variances and thus the accompanying standard errors can be substantially reduced. Since the proposed model strives to minimize standard error, it is applicable to situations where a precise sample is essential, for example, microeconomic simulation studies.
Numerical optimization and variational techniques, multivariate stratified sampling, subgradient optimization, Classification and discrimination; cluster analysis (statistical aspects), Sampling theory, sample surveys, Integer programming, Probabilistic methods, stochastic differential equations, sampling, statistics: cluster analysis, programming: integer algorithms, subgradient optimization [statistics], cluster analysis
Numerical optimization and variational techniques, multivariate stratified sampling, subgradient optimization, Classification and discrimination; cluster analysis (statistical aspects), Sampling theory, sample surveys, Integer programming, Probabilistic methods, stochastic differential equations, sampling, statistics: cluster analysis, programming: integer algorithms, subgradient optimization [statistics], cluster analysis
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 22 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
