Relative Error in the Central Limit Theorem Relative Error in the Central Limit Theorem
If you average a large number independent versions of the same random variable, the central limit theorem says the average will be approximately normal.... Relative Error in the Central Limit Theorem

If you average a large number independent versions of the same random variable, the central limit theorem says the average will be approximately normal. That is the absolute error in approximating the density of the average by the density of a normal random variable will be small. (Terms and conditions apply. See notes here.)

But the central limit theorem says nothing about relative error. Relative error can diverge to infinity while absolute error converges to zero. We’ll illustrate this with an example.

The average of N independent exponential(1) random variables has a gamma distribution with shape N and scale 1/N.

As N increases, the average becomes more like a normal in distribution. That is, the absolute error in approximating the distribution function of gamma random variable with that of a normal random variable decreases. (Note that we’re talking about distribution functions (CDFs) and not densities (PDFs). The previous post discussed a surprise with density functions in this example.)

The following plot shows that the difference between the distributions functions get smaller as N increases.

But when we look at the ratio of the tail probabilities, that is Pr(X > t) / Pr(Y  > t) where Xis the average of N exponential r.v.s and Y is the corresponding normal approximation from the central limit theorem, we see that the ratios diverge, and they diverge faster as N increases.

To make it clear what’s being plotted, here is the Python code used to draw the graphs above.

import matplotlib.pyplot as plt
from scipy.stats import gamma, norm
from scipy import linspace, sqrt
def tail_ratio(ns):
x = linspace(0, 4, 400)
for n in ns:
gcdf = gamma.sf(x, n, scale = 1/n)
ncdf = norm.sf(x, loc=1, scale=sqrt(1/n))
plt.plot(x, gcdf/ncdf
plt.yscale("log")        
plt.legend(["n = {}".format(n) for n in ns])
plt.savefig("gamma_normal_tail_ratios.svg")
def cdf_error(ns):
x = linspace(0, 6, 400)
for n in ns:
gtail = gamma.cdf(x, n, scale = 1/n)
ntail = norm.cdf(x, loc=1, scale=sqrt(1/n))
plt.plot(x, gtail-ntail)
plt.legend(["n = {}".format(n) for n in ns])
plt.savefig("gamma_normal_cdf_diff.svg")
ns = [1, 4, 16]
tail_ratio([ns)
cdf_error(ns)

Original Source

John Cook

John Cook

Companies come to me for help with probability, machine learning, mathematical modeling, … anything that falls under applied math, statistics, and computing. Clients have included large companies such as Amazon, Google, Microsoft, and Amgen, as well as a number of law firms, start-ups, and smaller businesses. I help companies take advantage of their data by combining it with expert opinion, uncovering latent insights, creating mathematical models, overcoming computational difficulties, and interpreting the results.

Open Data Science - Your News Source for AI, Machine Learning & more