Random Forest Classification of Mushrooms
There is a plethora of classification algorithms available to people who have a bit of coding experience and a set of data. A common machine learning method is the random forest, which is a good place to start. This is a use case in R of the randomForest package used on a data...
Jack Kwok is a Software Engineer with 15 years of professional experience. At Insight, he built a Deep Learning solution to automatically detect flooded roads during natural disasters. He is now a Software Engineer at Lyft working with Machine Learning and Deep Learning. Want to learn applied Artificial Intelligence...
Data Visualization – Part 3
What Type of Data Visualization Do You Choose (if any)? Determining whether or not you need a visualization is step one. While it seems silly, this is probably something everyone (including myself) should be doing more often. A lot of times, it seems like a great way to showcase the...
Scratch Viz – Documentation and Usage
Contents Introduction Audience Getting Started Data Scratch Blocks Example Projects Introduction If you have built castles in the air, your work need not be lost; that is where they should be. Now put the foundations under them. Henry David Thoreau Source: Why's (Poignant) Guide to Ruby This experimental Scratch extension aims to...
This blogpost is about topic modeling using data from this blog, opendatascience.com. From this, combined with the most visited articles of the year, we will generate the most popular topics of 2017. Last year, we did something similar with popular articles streamed through twitter using Non-Negative Matrix Factorization to determine topics, article...
Watermain Breaks in the City of Toronto
It has been a while since my last post due to the major transition of moving back to Canada. This post will be a bit shorter than my previous ones but hopefully it will give some insight on practically investigating and analyzing open data that are becoming more popular...
Plotting author statistics for Git repos using Git of Theseus
I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written in Python) that...
R, as I've pointed out before, has a package discovery problem. There's a new package, colorblindr, which lets you see the impact of various sorts of colour-blindness on a colour palette, a very useful thing for designing good graphics. When it's mentioned on Twitter, you see lots of people glad...
Happy, Healthy, Hungry. Mapping San Francisco Restaurant Cleanliness
Somewhat recently, Yelp announced that it is partnering with Code for America and the City of San Francisco to develop LIVES, an open data standard which allows municipalities to publish restaurant inspection data in a standardized format. This is a step towards allows a much much more transparent government,...
UNHCR Refugee Data Visualized
Where's the Data? The data I'm using is taken from the United Nations High Commissioner for Refugees (UNHCR) website – the UN Refugee Agency. You can read more on what they do and why the exist in the link above.  Currently you can only download the mid-year statistics for 2015. You get a...