Save 45% off ODSC East, it's just a few months away!

days

:

:

for an extra 20% off, use the code: ODSC20
Go
Integrating Pandas, Django REST Framework and Bokeh

Integrating Pandas, ...

It’s no secret that we love Django REST Framework. We’ve written quite a few blog posts about it and it is our default framework for projects that require a web API. Another package that we use a lot is Pandas (and NumPy by extension). It is fast, flexible, well documented and it has a very […]

Ad Hoc Distributed Random Forests #4

Ad Hoc Distributed R...

when arrays and dataframes aren’t flexible enough TL;DR. Dask.distributed lets you submit individual tasks to the cluster. We use this ability combined with Scikit Learn to train and run a distributed random forest on distributed tabular NYC Taxi data. Our machine learning model does not perform well, but we do learn how to execute ad-hoc […]

Pandas on HDFS with Dask Dataframes #2

Pandas on HDFS with ...

In this post we use Pandas in parallel across an HDFS cluster to read CSV data. We coordinate these computations with dask.dataframe. A screencast version of this blogpost is available here and the previous post in this series is available here. This work was originally at matthewrocklin.com and is supported by Continuum Analytics and the XDATA […]

Ad Hoc Distributed Random Forests

Ad Hoc Distributed R...

when arrays and dataframes aren’t flexible enough TL;DR. Dask.distributed lets you submit individual tasks to the cluster. We use this ability combined with Scikit Learn to train and run a distributed random forest on distributed tabular NYC Taxi data. Our machine learning model does not perform well, but we do learn how to execute ad-hoc computations easily. Motivation […]

Twitter Pandas

Twitter Pandas...

Thanks to some great help from contributors, we’ve just pushed the first release of twitter pandas, v0.0.1. The first release is aimed at replicating the data-providing (no create/update/delete functions) from the tweepy API with the git-pandas style pandas interface. To install twitterpandas, just use pip pip install twitterpandas And then you can use it right […]

Using Twitter-Pandas to Find Friends Who Don’t Follow You Back

Using Twitter-Pandas...

Over the past couple of months we’ve been gradually working on twitter-pandas, a pandas dataframe based interface to twitter data (powered by tweepy behind the scenes). I’ve posted about the first limited release previously here. The initial release was focused on just replicating the tweepy API as best as we could as a first building […]

Estimating the Time Spent on a Project with Git-Pandas

Estimating the Time ...

By: Will McGinnis, Mechanical Engineer – Prediko I stumbled across a conversation recently on the Tech404 slack channel (a pretty good public slack group for Atlanta area software folks) about mostly taxes, but nestled in the middle was this project: git_time_extractor. In the past I’ve noticed a kind of weird concentration of git related open […]

Bot or Not: A Data Analysis Using Python

Bot or Not: A Data A...

In this blog post Erin Shellman (@erinshellman) tries to detect which of his Twitter followers are real persons versus automated bots. Twitter bot detection is a standard problem on which quite a few papers have been written. In this post Shellman uses NLTK, pandas, and scikit-learn, and also compares results obtained with R caret package. […]