Save 45% off ODSC East, it's just a few months away!

days

:

:

for an extra 20% off, use the code: ODSC20
Go
Image Processing in Python

Image Processing in ...

Editor’s note: This post is part of the Silicon Valley Data Science Trainspotting series, a deep dive into the visual and audio detection components of the SVDS Caltrain project. Let them know your favorite part of the series and make comments on the original article here. The first step in developing our Caltrain project was creating a proof of […]

Introduction to Trainspotting

Introduction to Trai...

This was originally posted on the Silicon Valley Data Science blog. At Silicon Valley Data Science, we have a slight obsession with the Caltrain. Our interest stems from the fact that half of our employees rely on the Caltrain to get to work each day. We also want to give back to the community, and […]

Installing Jupyter with the PySpark and R kernels for Spark development

Installing Jupyter w...

This is a quick tutorial on installing Jupyter and setting up the PySpark and the R kernel (IRkernel) for Spark development. The pre-reqs for following this tutorial is to have a Hadoop/Spark cluster deployed and the relevant services up and running (e.g. HDFS, YARN, Hive, Spark etc.). In this tutorial I am using IBM’s Hadoop […]

Introducing Dask distributed #1

Introducing Dask dis...

tl;dr: We analyze JSON data on a cluster using pure Python projects. Dask, a Python library for parallel computing, now works on clusters. During the past few months I and others have extended dask with a new distributed memory scheduler. This enables dask’s existing parallel algorithms to scale across 10s to 100s of nodes, and extends a subset […]

Probability is hard: part 4

Probability is hard:...

This is the fourth part of a series of posts about conditional probability and Bayesian statistics. In the first article, I presented the Red Dice problem, which is a warm-up problem that might help us make sense of the other problems. In the second article, I presented the problem of interpreting medical tests when there is uncertainty about […]

Probability is hard: part three

Probability is hard:...

This is the third part of a series of posts about conditional probability and Bayesian statistics. In the first article, I presented the Red Dice problem, which is a warm-up problem that might help us make sense of the other problems. In the second article, I presented the problem of interpreting medical tests when there is uncertainty […]

Probability is hard, part two

Probability is hard,...

If you read the previous post, you know that my colleague Sanjoy Mahajan and I have been working on a series of problems related to conditional probability and Bayesian statistics.  In the previous article, I presented the Red Dice problem, which is relatively simple.  I posted it here because it presents four different versions of the […]

Probability is hard

Probability is hard...

For more than a month, my colleague Sanjoy Mahajan and I have been banging our heads on a series of problems related to conditional probability and Bayesian statistics.  We knew when we started that this material is tricky, as demonstrated by veridical paradoxes like the Monty Hall problem, the Girl Named Florida, and so on. […]

ODSC East 2016 | Peter Bull – “#lifehacks for the Jupyter Data Scientist”

ODSC East 2016 | Pet...

Abstract: Data Science is Software: While we don’t always think about it this way, the job or the data scientist is to build software. Often data scientists use only the most rudimentary of software engineering tools. It’s time we leverage the tools and best practices of software engineering that have been built over the last […]