It feels good to be a data geek in 2017. Last year, we asked “Is Big Data Still a Thing?”, observing that since Big Data is largely “plumbing”, it has been subject to enterprise adoption cycles that are much slower than the hype cycle. As a result, it took several... Read more
Simulation of empirical Bayesian methods (using baseball statistics)
Previously in this series: The beta distribution Empirical Bayes estimation Credible intervals The Bayesian approach to false discovery rates Bayesian A/B testing Beta-binomial regression Understanding empirical Bayesian hierarchical modeling Mixture models and expectation-maximization The ebbr package We’re approaching the end of this series on empirical Bayesian methods, and have... Read more
Come see Anshuman Guha, Data Scientist from Spark Cognition Speak at ODSC West.  Standardizing Software Boundaries Let’s imagine a scenario where a picture sharing company has an app that allows users to purchase pictures and have them printed and shipped via a postal service. Each module of the app could... Read more
Web Scraping Indeed for Key Data Science Job Skills
Editor’s Note: Check out our 2017 State of Data Science Jobs Report to compare stats, sentiments, and POVs. *available in Spanish   As many of you probably know, being a data scientist requires a large skill set . . . Read more
The software engineering rule of 3
Here’s a dumb extremely accurate rule I’m postulating* for software engineering projects: you need at least 3 examples before you solve the right problem. This is what I’ve noticed: Don’t factor out shared code between two classes. Wait until you have at least three. The two first attempts to solve a problem... Read more
Actuaries are bringing Netflix-like predictive modeling to health care
I’m an actuary. That means I use numbers to try to understand human behavior, manage risk, and evaluate the likelihood that a particular thing will happen in the future. Most people associate my work with green eyeshades and the morbid business of predicting how long someone is likely to... Read more
Enhancing Customer Experience with Natural Language Processing
Processing language into actionable components is the future of communication. If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart. — Nelson Mandela I would venture to guess that most... Read more
Bundle Buddy
When building a complex JavaScript application, it is common to minify code and bundle files together to optimize network requests so the app loads faster. A common pattern for complex and large applications is code splitting. Typically this breaks up the bundles by each route in your application so... Read more
We’re all familiar with terms like first, third, and developing the world when it comes to describing countries in relation to the word. “First-world” refers to the countries are richer, healthier, and more educated, while impoverish nations fall under the label of third-world. In addition, we occasionally hear “second-world”... Read more
Dimensional Modeling and Kimball Data Marts in the Age of Big Data and Hadoop
Is dimensional modeling dead? Before I give you an answer to this question let’s take a step back and first have a look at what we mean by dimensional data modelling. Why do we need to model our data? Contrary to a common misunderstanding, it is not the only... Read more