When the bootstrap doesn’t work
ModelingStatisticsposted by Thomas Lumley March 29, 2017 Thomas Lumley
The bootstrap always works, except sometimes.
By ‘works’ here, I mean in the weakest senses that the large-sample bootstrap variance correctly estimates the variance of the statistic, or that the large-scale percentile bootstrap intervals have their nominal coverage. I don’t mean the stronger sense that someone like Peter Hall might use, that the bootstrap gives higher-order accurate confidence intervals. So the bootstrap ‘works’ for the median, even though not as well as for smooth functions of the mean.
Here are the reasons I know of why the bootstrap might fail
0. Correlation. The one that everyone knows about nowadays. If your data have structure, such as a time series, a spatial map, a carefully-structured experimental design, a multistage survey, a network, then you can’t hope to get the right distribution by resampling in a way that doesn’t respect that structure.
1. Constraints: Suppose Xn∼N(θ,1)Xn∼N(θ,1) and we know θ≥0θ≥0. The maximum likelihood estimator of θθ is ^θ=max(¯X,0)θ^=max(X¯,0). If θ>0θ>0 there isn’t a problem asymptotically (or at a more sophisticated analysis, if θ≫1/√nθ≫1/n there isn’t). But if θ=0θ=0 the sampling distribution of ^θθ^ is a 50:50 mixture of a spike at zero and the positive half of a N(0,n−1)N(0,n−1) distribution. The bootstrap distribution is also a mixture of a spike at zero and and a half-normal, but the mass on the spike does not converge to 0.5 (or to anything else) as the sample size increases. The problem is that the height of the spike is Φ(¯X√n)Φ(X¯n), so the height converges in distribution to U(0,1)U(0,1).
2. Extrema. Consider X∼U(θ,1)X∼U(θ,1). The bootstrap replicates θ∗θ∗ have a distribution that puts mass 0.632=1−e−10.632=1−e−1 on the smallest observation, e−1(1−e−1)≈0.233e−1(1−e−1)≈0.233 on the second smallest, and so on geometrically. We always have θ∗≥^θθ∗≥θ^, and the bootstrap distribution stays very discrete as the sample size increases.
3. Lack of smoothness (cube-root asymptotics) Tukey’s shorth, the mean of the shortest half of the data, converges to the mean at n−1/3n−1/3 rate instead of the usual n−½n−½. The same is true for the least-median-of-squares regression line, the isotonic