Datasets for Building a Data Analysis Portfolio
I recently had the pleasure of attending the 2017 Association of Public Data Users (APDU) Conference. My favorite part of the conference was talking to people who work with federal data on a daily basis. Overall I found people to be passionate about their work and eager to share information... Read more
Beyond Computational Reproducibility, let us Aim for Reusability
Scientific progress calls for reproducing results. Due to limited resources, this is difficult even in computational sciences. Yet, reproducibility is only a means to an end. It is not enough by itself to enable new scientific results. Rather, new discoveries must build on reuse and modification of the state... Read more
It feels good to be a data geek in 2017. Last year, we asked “Is Big Data Still a Thing?”, observing that since Big Data is largely “plumbing”, it has been subject to enterprise adoption cycles that are much slower than the hype cycle. As a result, it took several... Read more
Web Scraping Indeed for Key Data Science Job Skills
Editor’s Note: Check out our 2017 State of Data Science Jobs Report to compare stats, sentiments, and POVs. *available in Spanish   As many of you probably know, being a data scientist requires a large skill set . . . Read more
Scraping OpenStreetMap and exploring POI in Cloudant and Jupyter Notebooks When working with data, the format of the raw data is not always user-friendly. For instance, the format could be one large binary file, or the data could spread across hundreds of text files. An easy way to solve... Read more
How hard can it be to compute conversion rate? Take the total number of users that converted and divide them with the total number of users. Done. Except… it’s a lot more complicated when you have any sort of significant time lag. Prelude — a story Fresh out of school I... Read more
Machine Learning: An In-Depth Guide – Data Selection, Preparation, and Modeling
Articles Overview, goals, learning types, and algorithms Data selection, preparation, and modeling Model evaluation, validation, complexity, and improvement Model performance and error analysis Unsupervised learning, related fields, and machine learning in practice Introduction Welcome to the second article in a five-part series about machine learning. In this article, we... Read more
Machine Learning: An In-Depth Guide – Overview, Goals, Learning Types, and Algorithms
Articles Overview, goals, learning types, and algorithms Data selection, preparation, and modeling Model evaluation, validation, complexity, and improvement Model performance and error analysis Unsupervised learning, related fields, and machine learning in practice Introduction Welcome! This is the first article of a five-part series about machine learning. Machine learning is... Read more
Prophet is Data Science not Statistics, and there is a Difference
Facebook’s prophet forecasting tool illustrates the distinction between a traditional statistical approach compared to the newer machine learning/data science paradigm. This distinction is cultural: it seems that the motivation behind prophet was to quickly make accurate forecasts (predictions), instead of getting bogged down in building models satisfying certain theoretical properties, which may or... Read more
This is the second article in our two-part series on using unsupervised and supervised machine learning techniques to analyze music data from Pandora and Spotify. Introduction As you may recall from the previous post I did, where I applied dimensionality reduction and clustering techniques to a set of songs I liked on... Read more