# When the bootstrap doesn’t work

ModelingStatisticsposted by Thomas Lumley March 29, 2017

The bootstrap always works, except sometimes. By ‘works’ here, I mean in the weakest senses that the large-sample bootstrap variance correctly estimates...

The bootstrap always works, except sometimes.

By ‘works’ here, I mean in the weakest senses that the large-sample bootstrap variance correctly estimates the variance of the statistic, or that the large-scale percentile bootstrap intervals have their nominal coverage. I don’t mean the stronger sense that someone like Peter Hall might use, that the bootstrap gives higher-order accurate confidence intervals. So the bootstrap ‘works’ for the median, even though not as well as for smooth functions of the mean.

Here are the reasons I know of why the bootstrap might fail

0. Correlation. The one that everyone knows about nowadays.  If your data have structure, such as a time series, a spatial map, a carefully-structured experimental design, a multistage survey, a network, then you can’t hope to get the right distribution by resampling in a way that doesn’t respect that structure.

1. Constraints: Suppose XnN(θ,1)Xn∼N(θ,1) and we know θ0θ≥0. The maximum likelihood estimator of θθ is ^θ=max(¯X,0)θ^=max(X¯,0). If θ>0θ>0 there isn’t a problem asymptotically (or at a more sophisticated analysis, if θ1/nθ≫1/n there isn’t).  But if θ=0θ=0 the sampling distribution of ^θθ^ is a 50:50 mixture of a spike at zero and the positive half of a N(0,n1)N(0,n−1) distribution.  The bootstrap distribution is also a mixture of a spike at zero and and a half-normal, but the mass on the spike does not converge to 0.5 (or to anything else) as the sample size increases. The problem is that the height of the spike is Φ(¯Xn)Φ(X¯n), so the height converges in distribution to U(0,1)U(0,1).

2. Extrema.  Consider XU(θ,1)X∼U(θ,1). The bootstrap replicates θθ∗ have a distribution that puts mass 0.632=1e10.632=1−e−1 on the smallest observation, e1(1e1)0.233e−1(1−e−1)≈0.233 on the second smallest, and so on geometrically. We always have θ^θθ∗≥θ^, and the bootstrap distribution stays very discrete as the sample size increases.

3. Lack of smoothness (cube-root asymptotics) Tukey’s shorth, the mean of the shortest half of the data, converges to the mean at n1/3n−1/3 rate instead of the usual n½n−½. The same is true for the least-median-of-squares regression line, the isotonic

## Thomas Lumley

Thomas Lumley attended Monash University (B.Sc.(Hons) in Pure Mathematics), the University of Oxford (M.Sc. in Applied Statistics) and the University of Washington, Seattle (PhD in Biostatistics). He spent twelve years on the faculty of the Department of Biostatistics at the University of Washington, and then moved to Auckland in 2010. He is still an Affiliate Professor at the University of Washington.

1