## Statistical Software Matters

ModelingStatisticsposted by Thomas Lumley June 29, 2018

This is a picture of all the genetic associations found in genome-wide association studies, sorted by chromosome. You can find more detail at the NHGRI GWAS catalog There are two chromosomes with many fewer associations. One is the Y chromosome. There isn’t much there because there isn’t much... Read more

## Partition numbers and Ramanujan’s approximation

ModelingStatisticsposted by John Cook June 25, 2018

The partition function p(n) counts the number of ways n unlabeled things can be partitioned into non-empty sets. (Contrast with Bell numbers that count partitions of labeled things.) There’s no simple expression for p(n), but Ramanujan discovered a fairly simple asymptotic approximation: How accurate is this approximation? Here’s a little Matheamtica code to see. p := PartitionsP... Read more

## Talking About Clinical Significance

ModelingStatisticsposted by John Mount June 22, 2018

In statistical work in the age of big data we often get hung up on differences that are statistically significant (reliable enough to show up again and again in repeated measurements), but clinically insignificant (visible in aggregation, but too small to make any real difference to individuals). An example would be: a diet... Read more

## Stirling Numbers, Including Negative Arguments

ModelingStatisticsposted by John Cook June 20, 2018

Stirling numbers are something like binomial coefficients. They come in two varieties, imaginatively called the first kind and second kind. Unfortunately it is the second kind that are simpler to describe and that come up more often in applications, so we’ll start there. Stirling numbers of the second kind... Read more

## Fixed Points of Logistic Function

ModelingStatisticsposted by John Cook June 15, 2018

Here’s an interesting problem that came out of a logistic regression application. The input variable was between 0 and 1, and someone asked when and where the logistic transformation f(x) = 1/(1 + exp(a + bx)) has a fixed point, i.e. f(x) = x. So given logistic regression parameters a and b, when does the logistic curve... Read more

## Relative Error in the Central Limit Theorem

ModelingStatisticsposted by John Cook June 12, 2018

If you average a large number independent versions of the same random variable, the central limit theorem says the average will be approximately normal. That is the absolute error in approximating the density of the average by the density of a normal random variable will be small. (Terms and conditions apply.... Read more

## Quantifying Uncertainty with Bayesian Statistics

ModelingStatisticsposted by Mat Leonard June 5, 2018

Whenever we’re working with data, there is necessarily uncertainty in our results. Firstly, we can’t collect all the possible data, so instead we randomly sample from a population. Accordingly, there is a natural variance and uncertainty in any data we collect. There is also uncertainty from missing data, systematic... Read more

## Robustness and Tests for Equal Variance

ModelingStatisticsposted by John Cook May 30, 2018

The two-sample t-test is a way to test whether two data sets come from distributions with the same mean. I wrote a few days ago about how the test performs under ideal circumstances, as well as less than ideal circumstances. This is an analogous post for testing whether two data sets come... Read more

## “I hate math!” – Education and Artificial Intelligence to find a meaning

ModelingStatisticsposted by Pablo Casas May 21, 2018

Well, what you hate is the way that math was taught to you. That soup of equations, abstractions, and solutions to problems that we don’t know, It’s hard to enjoy the things we don’t feel part of. But how about relating some math techniques from the world that surrounds... Read more

## Least Squares Solutions to Over- or Underdetermined Systems

ModelingStatisticsposted by John Cook May 17, 2018

It often happens in applications that a linear system of equations Ax = b either does not have a solution or has infinitely many solutions. Applications often use least squares to create a problem that has a unique solution. Overdetermined systems Suppose the matrix A has dimensions m by n and the right hand side vector b has dimension m. Then the solution x, if... Read more