Stirling numbers are something like binomial coefficients. They come in two varieties, imaginatively called the first kind and second kind. Unfortunately it is the second kind that are simpler to describe and that come up more often in applications, so we’ll start there. Stirling numbers of the second kind... Read more

Here’s an interesting problem that came out of a logistic regression application. The input variable was between 0 and 1, and someone asked when and where the logistic transformation f(x) = 1/(1 + exp(a + bx)) has a fixed point, i.e. f(x) = x. So given logistic regression parameters a and b, when does the logistic curve... Read more

If you average a large number independent versions of the same random variable, the central limit theorem says the average will be approximately normal. That is the absolute error in approximating the density of the average by the density of a normal random variable will be small. (Terms and conditions apply.... Read more

Whenever we’re working with data, there is necessarily uncertainty in our results. Firstly, we can’t collect all the possible data, so instead we randomly sample from a population. Accordingly, there is a natural variance and uncertainty in any data we collect. There is also uncertainty from missing data, systematic... Read more

The two-sample t-test is a way to test whether two data sets come from distributions with the same mean. I wrote a few days ago about how the test performs under ideal circumstances, as well as less than ideal circumstances. This is an analogous post for testing whether two data sets come... Read more

Well, what you hate is the way that math was taught to you. That soup of equations, abstractions, and solutions to problems that we don’t know, It’s hard to enjoy the things we don’t feel part of. But how about relating some math techniques from the world that surrounds... Read more

It often happens in applications that a linear system of equations Ax = b either does not have a solution or has infinitely many solutions. Applications often use least squares to create a problem that has a unique solution. Overdetermined systems Suppose the matrix A has dimensions m by n and the right hand side vector b has dimension m. Then the solution x, if... Read more

Here at Civis Analytics, we recently discovered that an estimated eighty-one percent of Americans support increased federal spending on programs that benefit children — programs including strong support for enhancing children’s access to healthcare, affordable housing, quality K-12 education, and food — according to a survey we ran on behalf of the Children’s... Read more

What’s content addressing? What does it have to do with datasets? Why am I on this site in the first place? Read on, dear reader. Read on. The world of linked data is built on shaky foundations that prevent a true data commons from emerging. The problem isn’t with... Read more

Researchers have discovered that for some problems, deep neural networks (DNNs) can get by with low precision weights. Using fewer bits to represent weights means that more weights can fit in memory at once. This, as well as embedded systems, has renewed interest in low-precision floating point. Microsoft mentioned... Read more