## Tips for Linear Regression Diagnostics

ModelingStatisticsposted by Daniel Gutierrez, ODSC August 29, 2018

I like to call linear regression the data scientist’s “workhorse.” It may not be sexy, but it’s a tried and proven technique that can be very useful. When the problem you’re trying to solve requires the prediction of a numeric response variable using multiple continuous (numeric) and/or categorical predictors,... Read more

## Joint, Conditional, and Marginal Probability Distributions

ModelingStatisticsposted by Eric Ma August 15, 2018

Joint probability, conditional probability, and marginal probability… These are three central terms when learning about probability, and they show up in Bayesian statistics as well. However… I never really could remember what they were, especially since we were usually taught them using formulas, rather than pictures. Well, for those... Read more

## The Cold Start Problem

ModelingStatisticsposted by John Cook August 10, 2018

How do you operate a data-driven application before you have any data? This is known as the cold start problem. We faced this problem all the time when I designed clinical trials at MD Anderson Cancer Center. We uses Bayesian methods to design adaptive clinical trial designs, such as clinical trials... Read more

## Distribution of Eigenvalues for Symmetric Gaussian Matrix

ModelingStatisticsposted by John Cook August 7, 2018

Symmetric Gaussian matrices The previous post looked at the distribution of eigenvalues for very general random matrices. In this post we will look at the eigenvalues of matrices with more structure. Fill an n by n matrix A with values drawn from a standard normal distribution and let Mbe the average of A and its transpose, i.e. M = ½(A + AT). The eigenvalues... Read more

## How Well Did Data Scientists Predict the 2018 World Cup? (Hint: Not Very)

Data WranglingModelingPredictive AnalyticsResearchStatisticspopular cultureworld cupposted by Alex Amari July 26, 2018

This year’s World Cup in Russia was the most watched sporting event in history. GlobalWebIndex reports that up to 3.4 billion people – around half of the world’s population – watched some part of the tournament. As with past World Cups, a global prediction market emerged allowing spectators to... Read more

## Attribution Based on Tail Probabilities

ModelingStatisticsposted by John Cook July 25, 2018

If all you know about a person is that he or she is around 5′ 7″, it’s a toss-up whether this person is male or female. If you know someone is over 6′ tall, they’re probably male. If you hear they are over 7″ tall, they’re almost certainly male.... Read more

## ECDFs: “Empirical Cumulative Distribution Function”

ModelingStatisticsposted by Eric Ma July 23, 2018

In my two SciPy 2018 co-taught tutorials, I made the case that ECDFs provide richer information compared to histograms. My main points were: We can more easily identify central tendency measures, in particular, the median, compared to a histogram. We can much more easily identify other percentile values, compared... Read more

## How Far is xy From yx on Average for Quaternions?

ModelingStatisticsposted by John Cook July 16, 2018

Given two quaternions x and y, the product xy might equal the product yx, but in general the two results are different. How different are xy and yx on average? That is, if you selected quaternions x and y at random, how big would you expect the difference xy – yx to be? Since this difference would increase proportionately if you increased the length of x or y, we can just... Read more

## Low-Rank Matrix Perturbations

ModelingStatisticsposted by John Cook July 12, 2018

Here are a couple of linear algebra identities that can be very useful, but aren’t that widely known, somewhere between common knowledge and arcane. Neither result assumes any matrix has low rank, but their most common application, at least in my experience, is in the context of something of... Read more

## Linear Regression and Planet Spacing

ModelingStatisticsposted by John Cook July 6, 2018

Linear Regression and Planet Spacing A while back I wrote about how planets are evenly spaced on a log scale. I made a bunch of plots, based on our solar system and the extrasolar systems with the most planets, and said noted that they’re all roughly straight lines. Here’s the... Read more