fbpx
Firing on All Cylinders: The 2017 Big Data Landscape, part 2
A walk through the 2017 Data Ecosystem Landscape INFRASTRUCTURE A lot of themes from last year have continued to play out, such as the ever-increasing importance of streaming, with Spark reigning supreme for now, with interesting contenders such as Flink emerging. In addition, a few interesting themes have kept coming back in... Read more
Datasets for Building a Data Analysis Portfolio
I recently had the pleasure of attending the 2017 Association of Public Data Users (APDU) Conference. My favorite part of the conference was talking to people who work with federal data on a daily basis. Overall I found people to be passionate about their work and eager to share information... Read more
Beyond Computational Reproducibility, let us Aim for Reusability
Scientific progress calls for reproducing results. Due to limited resources, this is difficult even in computational sciences. Yet, reproducibility is only a means to an end. It is not enough by itself to enable new scientific results. Rather, new discoveries must build on reuse and modification of the state... Read more
It feels good to be a data geek in 2017. Last year, we asked “Is Big Data Still a Thing?”, observing that since Big Data is largely “plumbing”, it has been subject to enterprise adoption cycles that are much slower than the hype cycle. As a result, it took several... Read more
Web Scraping Indeed for Key Data Science Job Skills
Editor’s Note: Check out our 2017 State of Data Science Jobs Report to compare stats, sentiments, and POVs. *available in Spanish   As many of you probably know, being a data scientist requires a large skill set . . . Read more
Scraping OpenStreetMap and exploring POI in Cloudant and Jupyter Notebooks When working with data, the format of the raw data is not always user-friendly. For instance, the format could be one large binary file, or the data could spread across hundreds of text files. An easy way to solve... Read more
How hard can it be to compute conversion rate? Take the total number of users that converted and divide them with the total number of users. Done. Except… it’s a lot more complicated when you have any sort of significant time lag. Prelude — a story Fresh out of school I... Read more
Machine Learning: An In-Depth Guide – Data Selection, Preparation, and Modeling
Articles Overview, goals, learning types, and algorithms Data selection, preparation, and modeling Model evaluation, validation, complexity, and improvement Model performance and error analysis Unsupervised learning, related fields, and machine learning in practice Introduction Welcome to the second article in a five-part series about machine learning. In this article, we... Read more
Machine Learning: An In-Depth Guide – Overview, Goals, Learning Types, and Algorithms
Articles Overview, goals, learning types, and algorithms Data selection, preparation, and modeling Model evaluation, validation, complexity, and improvement Model performance and error analysis Unsupervised learning, related fields, and machine learning in practice Introduction Welcome! This is the first article of a five-part series about machine learning. Machine learning is... Read more
This is the second article in our two-part series on using unsupervised and supervised machine learning techniques to analyze music data from Pandora and Spotify. Introduction As you may recall from the previous post I did, where I applied dimensionality reduction and clustering techniques to a set of songs I liked on... Read more