R Tip: Introduce Indices to Avoid for() Class Loss Issues
Here is an R tip. Use loop indices to avoid for()-loops damaging classes. Below is an R annoyance that occurs again and again: vectors lose class attributes when you iterate over them in a for()-loop. d <- c(Sys.time(), Sys.time()) print(d) #> "2018-02-18 10:16:16 PST" "2018-02-18 10:16:16 PST" for(di in d) { print(di)... Read more
Is R base::subset() really that bad?
Is R base::subset() really that bad? Notes discussing subset() often refer to the following text (from help(subset), referred to in examples: 1, 2): Warning This is a convenience function intended for use interactively. For programming it is better to use the standard sub-setting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences. Is... Read more
Here is an R tip. Need to quote a lot of names at once? Use qc(). This is particularly useful in selecting columns from data.frames: library("wrapr") # get qc() definition head(mtcars) # mpg cyl wt # Mazda RX4 21.0 6 2.620 # Mazda RX4 Wag 21.0 6 2.875 # Datsun... Read more
This post is dedicated to my mother – Seinfeld’s greatest fan. Seinfeld is a classic TV sitcom. It featured four main characters surrounded by relatively normal, everyday, run of the mill scenarios. In the spirit of Seinfeld, this post will also “be about nothing.” Load Required Libraries library(scales) library(RMySQL)... Read more
Note: Cross-posted with the Stack Overflow blog. Check out the code for this analysis on Kaggle. For me, the weekends are mostly about spending time with my family, reading for leisure, and working on the open-source projects I am involved in. These weekend projects overlap with the work that I do... Read more
Earlier this month, I, along with John Nash, Spencer Graves, and Ludovic Vannoorenberghe, organized a session at useR!2017 focused on discovering, learning about, and evaluating R packages. You can check out the recording of the session. There are more than 11,000 packages on CRAN, and R users must approach this abundance of packages... Read more
xray: The R Package to Have X Ray Vision on Your Datasets
This package lets you analyze the variables of a dataset, to evaluate how the data is shaped. Consider this the first step when you have your data for modeling, you can use this package to analyze all variables and check if there is anything weird worth transforming or even... Read more
Anomaly Detection in R
Introduction Inspired by this Netflix post, I decided to write a post based on this topic using R. There are several nice packages to achieve this goal, the one we´re going to review is AnomalyDetection. Download full –and tiny– R code of this post here. Normal Vs. Abnormal The definition for abnormal,... Read more
Word Vectors with Tidy Data Principles
Last week I saw Chris Moody’s post on the Stitch Fix blog about calculating word vectors from a corpus of text using word counts and matrix factorization, and I was so excited! This blog post illustrates how to implement that approach to find word vector representations in R using tidy data... Read more
rquery: Fast Data Manipulation in R
Win-Vector LLC recently announced the rquery R package, an operator based query generator. In this note I want to share some exciting and favorable initial rquery benchmark timings. Note we have now (1-16-2018) re-run this benchmark with a faster, better tuned, version of the data.table solution (same package, just better use of it). Let’s take a look at... Read more