Image Compression In 10 Lines of R Code
Principal Component Analysis (PCA) is a powerful Machine Learning tool. As an unsupervised learning technique, it excels in dimension reduction and feature extraction However, do you know we can use PCA to compress images? In this post, I’ll walk through the process and explain how PCA can compress images... Read more
A Quick Look Into Bootstrapping
Executive Summary As a resampling method, bootstrapping allows us to generate statistical inferences about the population from a single sample. Learn to bootstrap in R. Bootstrapping lies the foundation for several machine learning methods (e.g., Bagging. I’ll explain Bagging in a follow-up post). [Related Article: Discovering 135 Nights of... Read more
Implementing a Kernel Principal Component Analysis in Python
In this article, we discuss implementing a kernel Principal Component Analysis in Python, with a few examples.  Many machine learning algorithms make assumptions about the linear separability of the input data. The perceptron even requires perfectly linearly separable training data to converge. Other algorithms that we have covered so... Read more
XGBoost is Machine Learning’s Captain America
Captain America is great in many ways. He is not necessarily the strongest, nor the fastest. He cannot fly or shoot anything our his hands. However, he is a consistent leader, that is well understood, admired and very effective. Also, when needed he can even wield Thor’s hammer. So,... Read more
Are You Ready to Lead a Data Science Project?
What is the problem that is compelling you to solve using data science? The power in data and the mechanisms to harness this power is now available to us. Identifying the right problem or use case is the first step. There are multiple use cases across the industry being... Read more
Web Scraping News Articles in Python
This article is the second of a series in which I will cover the whole process of developing a machine learning project. If you have not read the first one, I strongly encourage you to do it here. The project involves the creation of a real-time web application that gathers data from several newspapers... Read more
Balancing Interpretability and Predictive Power with Cubist Models in R
Machine learning models are powerful tools that do well in their purpose of prediction. In many business applications, the power of these models is quite beneficial. With any application of a machine learning model, the process to choosing which model involves determining the model that performs best across a... Read more
Local Regression in Python
I love data visualization make-overs (like this one I wrote a few months ago), but sometimes the tone can be too negative (like this one I wrote a few months ago). Sarah Leo, a data journalist at The Economist, has found the perfect solution: re-making your own visualizations. Here’s her tweet.... Read more
5 Steps to Implementing a Data Literacy-Driven DataOps Framework
DataOps is a new framework that has been gathering greater attention in the past year since it first appeared on the Gartner Hype Cycle. DataOps is defined as a new way of thinking related to data which encompasses people, processes, and technology, resulting in improved collaboration and streamlined decision-making... Read more
Text Classification in Python
This article is the first of a series in which I will cover the whole process of developing a machine learning project. This one focuses on training a supervised learning text classification model in Python. The motivation behind writing these articles is the following: as a learning data scientist who has... Read more