Watermain Breaks in the City of Toronto
It has been a while since my last post due to the major transition of moving back to Canada. This post will be a bit shorter than my previous ones but hopefully it will give some insight on practically investigating and analyzing open data that are becoming more popular... Read more
Plotting author statistics for Git repos using Git of Theseus
I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written in Python) that... Read more
R, as I’ve pointed out before, has a package discovery problem. There’s a new package, colorblindr, which lets you see the impact of various sorts of colour-blindness on a colour palette, a very useful thing for designing good graphics. When it’s mentioned on Twitter, you see lots of people glad... Read more
Happy, Healthy, Hungry. Mapping San Francisco Restaurant Cleanliness
Somewhat recently, Yelp announced that it is partnering with Code for America and the City of San Francisco to develop LIVES, an open data standard which allows municipalities to publish restaurant inspection data in a standardized format. This is a step towards allows a much much more transparent government,... Read more
UNHCR Refugee Data Visualized
Where’s the Data? The data I’m using is taken from the United Nations High Commissioner for Refugees (UNHCR) website – the UN Refugee Agency. You can read more on what they do and why the exist in the link above.  Currently you can only download the mid-year statistics for 2015. You get a... Read more
Data Visualization for Situational Awareness
When you look at the image below, do you feel a sense of urgency? It would be surprising if you did. This is showing a user interface for a transmission control room where operators monitor and manage power grids. Although it may not seem like it, there’s a big... Read more
Data visualizations can have very different goals and functions depending on the area of application. Success must therefore be measured against different quality criteria, depending on the task. When used in data analysis, success means, that a data scientist can identify the structures and patterns that she needs for... Read more
Visual Analytics of Instagram’s #gopro hashtag with AI
Images have become a very common medium of human expression on the internet with the coming up of social networks. Facebook is the biggest repository of digital images ever. This trend is only going to intensify given the emergence of image first platforms like Instagram and Snapchat, also called... Read more
Time Series Analysis with Generalized Additive Models
Whenever you spot a trend plotted against time, you would be looking at a time series. The de facto choice for studying financial market performance and weather forecasts, time series are one of the most pervasive analysis techniques because of its inextricable relation to time—we are always interested to foretell... Read more
The retreat from religion is accelerating
This is an extended version of my article in the Scientific American blog. The data I used and all of my code are available in this Jupyter notebook. Secularization in the Unites States For more than a century religion in the the United States has defied gravity.  According to the Theory of... Read more