Warning: Invalid argument supplied for foreach() in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 95

Warning: array_merge(): Expected parameter 2 to be an array, null given in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 102

Key Takeaways: It’s important for data scientists to understand the so-called “gap” between statistics and machine learning, and how there actually is a lot of commonality between the two; it’s just a matter of how you look at things. PyMC3 is a very useful probabilistic programming... Read more

The Central Limit Theorem (CLT) is arguably the most important theorem in statistics. It’s certainly a concept that every data scientist should fully understand. In this article, we’ll go over some basic theory of the CLT, explain why it’s important for data scientists, and present some... Read more

Pattern mining is an incredibly simple but powerful technique for discovering cooccurrences in large datasets. The most common approach to find those patterns is Market Basket Analysis, which is frequently pointed out as the method Amazon leverages for their “users also purchased” feature. Of course, that’s... Read more

First of all, I would like to point out that the skill of building MVP and microservices for a data scientist is extremely useful! When you can build a prototype and test it in a working environment it just feels so much better and allows you... Read more

Statistical methods are inarguably the hottest approach to evaluating datasets at scale right now. They’re not without their weaknesses though – they’re ultimately heuristic, and some methods like neural networks require tremendous amounts of data to create a well-fitted model. That’s where semantics come in. If... Read more

I like to call linear regression the data scientist’s “workhorse.” It may not be sexy, but it’s a tried and proven technique that can be very useful. When the problem you’re trying to solve requires the prediction of a numeric response variable using multiple continuous (numeric)... Read more

Joint probability, conditional probability, and marginal probability… These are three central terms when learning about probability, and they show up in Bayesian statistics as well. However… I never really could remember what they were, especially since we were usually taught them using formulas, rather than pictures.... Read more

How do you operate a data-driven application before you have any data? This is known as the cold start problem. We faced this problem all the time when I designed clinical trials at MD Anderson Cancer Center. We uses Bayesian methods to design adaptive clinical trial designs, such... Read more

Symmetric Gaussian matrices The previous post looked at the distribution of eigenvalues for very general random matrices. In this post we will look at the eigenvalues of matrices with more structure. Fill an n by n matrix A with values drawn from a standard normal distribution and let Mbe the average of A and its transpose, i.e. M =... Read more

This year’s World Cup in Russia was the most watched sporting event in history. GlobalWebIndex reports that up to 3.4 billion people – around half of the world’s population – watched some part of the tournament. As with past World Cups, a global prediction market emerged... Read more