fbpx
Building an Interactive Web “mapp” with Shiny
The purpose of this post is to discuss the key elements in developing an interactive web application that displays data with geographic component. I discuss developing an app using Shiny – a powerful R package. I briefly compare that process to building a similar product in... Read more
Iterating over hash sets quickly in Java
There are many ways in software to represent a set. The most common approach is to use a hash table. We define a “hash function” that takes as an input our elements and produces as an output an integer that “looks random”. Then your element is... Read more
While you wait for that to finish, can I interest you in parallel processing?
caret has been able to utilize parallel processing for some time (before it was on CRAN in October 2007) using slightly different versions of the package. Around September of 2011, caret started using the foreach package was used to “harmonize” the parallel processing technologies thanks to a super smart guy named Steve Weston.... Read more
Word Vectors with Tidy Data Principles
Last week I saw Chris Moody’s post on the Stitch Fix blog about calculating word vectors from a corpus of text using word counts and matrix factorization, and I was so excited! This blog post illustrates how to implement that approach to find word vector representations in R... Read more
It’s been a couple of weeks since I got accepted in the closed beta testing programme for IBM Data Science Experience (DSX), and it is about time I share my thoughts on this offering.DSX is a new product, which IBM is positioning as a new generation Data Science... Read more
Plotting author statistics for Git repos using Git of Theseus
I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written... Read more
Happy, Healthy, Hungry. Mapping San Francisco Restaurant Cleanliness
Somewhat recently, Yelp announced that it is partnering with Code for America and the City of San Francisco to develop LIVES, an open data standard which allows municipalities to publish restaurant inspection data in a standardized format. This is a step towards allows a much much... Read more
In a previous article, we discussed the origin story and history of the Python deep learning library TensorFlow. It’s experienced a monumental rise like nothing seen before, in just two years since its debut it currently holds the title of the most forked repo on GitHub.... Read more
How Docker Can Help You Become A More Effective Data Scientist
For the past 5 years, I have heard lots of buzz about docker containers. It seemed like all my software engineering friends are using them for developing applications. I wanted to figure out how this technology could make me more effective but I found tutorials online... Read more
On Machine Learning and Programming Languages
This article was co-written by Mike Innes (Julia Computing), David Barber (UCL), Tim Besard (UGent), James Bradbury (Salesforce Research), Valentin Churavy (MIT), Simon Danisch (MIT), Alan Edelman (MIT), Stefan Karpinski (Julia Computing), Jon Malmaud (MIT), Jarrett Revels (MIT), Viral Shah (Julia Computing), Pontus Stenetorp (UCL) and... Read more