No one can master every algorithm. However, there is a basic toolbox that every Data Scientist understands and uses. One of the algorithms in this toolbox is Principal Component Analysis – P.C.A for short. It is an unsupervised learning technique used in many different fields to handle high dimensional... Read more
Within soccer’s nascent analytics movement, one metric dominates most discussions. It’s called Expected Goals or xG. Models for calculating xG differ, but the underlying concept is the same. In a nutshell, xG takes a shot’s characteristics – distance from goal, angle from goal, root cause, etc. – and assigns... Read more
Data science is an interdisciplinary endeavor, and it serves the purpose of extracting insight from varying sources of information. Various communities come together at Data Science Conferences to share their knowledge and promote innovation. It is not surprising, then, that the tools showcased by data scientists at ODSC East... Read more
Amazon Machine Learning: Nice and Easy or Overly Simple?
Can the new Amazon Machine Learning help companies reap the benefits of predictive analytics? Machine Learning as a Service (MLaaS) promises to put data science within the reach of companies. In that context, Amazon Machine Learning is a predictive analytics service with binary/multiclass classification and linear regression features. The service... Read more
Riding on Large Data with Scikit-learn
What’s a Large Data Set? A data set is said to be large when it exceeds 20% of the available RAM for a single machine. Which for your standard MacBook Pro with 8Gb of RAM, corresponds to a meager 2Gb dataset — size that is becoming more and more... Read more
Learning to Love Bayesian Statistics – Allen Downey ODSC Boston 2015
http://tinyurl.com/lovebayes Bayesian statistical methods provide powerful tools for answering questions and making decisions. For example, the result of Bayesian analysis is a set of values and probabilties that can be fed directly into a cost-benefit analysis, which is not possible with conventional statistics. But there are several barriers to... Read more
Probabilistic Programming in Data Science – Thomas Wiecki ODSC Boston 2015
http://bit.ly/ThomasWieckiPresentation There exist a large number of metrics to evaluate the performance-risk trade-off of a portfolio. Although those metrics have proven to be useful tools in practice, most of them require a large amount of data and implicitly assume returns to be normally distributed. Bayesian modeling is a statistical... Read more
An introduction to Bayesian Statistics using Python – Allen Downey ODSC Boston 2015
An introduction to Bayesian Statistics using Python from freshdatabos Read more