R
Here is an R tip. Need to quote a lot of names at once? Use qc(). This is particularly useful in selecting columns from data.frames: library("wrapr") # get qc() definition head(mtcars) # mpg cyl wt # Mazda RX4 21.0 6 2.620 # Mazda RX4 Wag 21.0 6 2.875 # Datsun... Read more
SEINFELD CHARACTERS – A POST ABOUT NOTHING
This post is dedicated to my mother – Seinfeld’s greatest fan. Seinfeld is a classic TV sitcom. It featured four main characters surrounded by relatively normal, everyday, run of the mill scenarios. In the spirit of Seinfeld, this post will also “be about nothing.” Load Required Libraries library(scales) library(RMySQL)... Read more
WHAT PROGRAMMING LANGUAGES ARE USED MOST ON WEEKENDS?
Note: Cross-posted with the Stack Overflow blog. Check out the code for this analysis on Kaggle. For me, the weekends are mostly about spending time with my family, reading for leisure, and working on the open-source projects I am involved in. These weekend projects overlap with the work that I do... Read more
NAVIGATING THE R PACKAGE UNIVERSE
Earlier this month, I, along with John Nash, Spencer Graves, and Ludovic Vannoorenberghe, organized a session at useR!2017 focused on discovering, learning about, and evaluating R packages. You can check out the recording of the session. There are more than 11,000 packages on CRAN, and R users must approach this abundance of packages... Read more
xray: The R Package to Have X Ray Vision on Your Datasets
This package lets you analyze the variables of a dataset, to evaluate how the data is shaped. Consider this the first step when you have your data for modeling, you can use this package to analyze all variables and check if there is anything weird worth transforming or even... Read more
Anomaly Detection in R
Introduction Inspired by this Netflix post, I decided to write a post based on this topic using R. There are several nice packages to achieve this goal, the one we´re going to review is AnomalyDetection. Download full –and tiny– R code of this post here. Normal Vs. Abnormal The definition for abnormal,... Read more
Word Vectors with Tidy Data Principles
Last week I saw Chris Moody’s post on the Stitch Fix blog about calculating word vectors from a corpus of text using word counts and matrix factorization, and I was so excited! This blog post illustrates how to implement that approach to find word vector representations in R using tidy data... Read more
rquery: Fast Data Manipulation in R
Win-Vector LLC recently announced the rquery R package, an operator based query generator. In this note I want to share some exciting and favorable initial rquery benchmark timings. Note we have now (1-16-2018) re-run this benchmark with a faster, better tuned, version of the data.table solution (same package, just better use of it). Let’s take a look at... Read more
R, as I’ve pointed out before, has a package discovery problem. There’s a new package, colorblindr, which lets you see the impact of various sorts of colour-blindness on a colour palette, a very useful thing for designing good graphics. When it’s mentioned on Twitter, you see lots of people glad... Read more
Making a machine learning model usually takes a lot of crying, pain, feature engineering, suffering, training, debugging, validation, desperation, testing and a little bit of agony due to the infinite pain. After all that, we deploy the model and use it to make predictions for future data. We can run our little devil on a batch... Read more