Relative Error in the Central Limit Theorem
ModelingStatisticsposted by John Cook June 12, 2018
If you average a large number independent versions of the same random variable, the central limit theorem says the average will be approximately normal. That is the absolute error in approximating the density of the average by the density of a normal random variable will be small. (Terms... Read more
Quantifying Uncertainty with Bayesian Statistics
ModelingStatisticsposted by Mat Leonard June 5, 2018
Whenever we’re working with data, there is necessarily uncertainty in our results. Firstly, we can’t collect all the possible data, so instead we randomly sample from a population. Accordingly, there is a natural variance and uncertainty in any data we collect. There is also uncertainty from... Read more
Robustness and Tests for Equal Variance
ModelingStatisticsposted by John Cook May 30, 2018
The two-sample t-test is a way to test whether two data sets come from distributions with the same mean. I wrote a few days ago about how the test performs under ideal circumstances, as well as less than ideal circumstances. This is an analogous post for testing whether two... Read more
“I hate math!” – Education and Artificial Intelligence to find a meaning
ModelingStatisticsposted by Pablo Casas May 21, 2018
Well, what you hate is the way that math was taught to you. That soup of equations, abstractions, and solutions to problems that we don’t know, It’s hard to enjoy the things we don’t feel part of. But how about relating some math techniques from the... Read more
Least Squares Solutions to Over- or Underdetermined Systems
ModelingStatisticsposted by John Cook May 17, 2018
It often happens in applications that a linear system of equations Ax = b either does not have a solution or has infinitely many solutions. Applications often use least squares to create a problem that has a unique solution. Overdetermined systems Suppose the matrix A has dimensions m by n and the right hand side vector b has dimension m. Then... Read more
Ask America Anything Results: Broad Support for Programs that Benefit Children
ModelingStatisticsposted by Civis Analytics Team May 14, 2018
Here at Civis Analytics, we recently discovered that an estimated eighty-one percent of Americans support increased federal spending on programs that benefit children — programs including strong support for enhancing children’s access to healthcare, affordable housing, quality K-12 education, and food — according to a survey we ran on behalf... Read more
Datasets Are Books, Not Houses
Deep LearningModelingStatisticsposted by Brendan O'Brien May 2, 2018
What’s content addressing? What does it have to do with datasets? Why am I on this site in the first place? Read on, dear reader. Read on. The world of linked data is built on shaky foundations that prevent a true data commons from emerging. The... Read more
Eight-bit Floating Point
ModelingStatisticsposted by John Cook April 25, 2018
Researchers have discovered that for some problems, deep neural networks (DNNs) can get by with low precision weights. Using fewer bits to represent weights means that more weights can fit in memory at once. This, as well as embedded systems, has renewed interest in low-precision floating... Read more
Small p hacking
ModelingStatisticsposted by Thomas Lumley April 23, 2018
The proposal to change p-value thresholds from 0.05 to 0.005 won’t die. I think it’s targeting the wrong question: many studies are too weak in various ways to provide the sort of reliable evidence they want to claim, and the choices available in analysis and publication... Read more
5 strategies for converting Big Data into actionable insights
ModelingStatisticsposted by Naveen Joshi April 12, 2018
The strategy to turn the raw data into actionable insights is to integrate and analyze data from all data sources to reach better and optimized business decisions. The word “big” in big data refers to the huge volume of data involved. Big data technologies aim at... Read more