Testing Probability Distribution Generators
In the ‘regression tests’ that are part of any change to the base-R source code, there’s a file called p-r-random-tests.R. People notice it from time to time because the tests sometimes fail. That’s what is supposed to happen. Testing random number generators is hard, because it’s hard to specify what... Read more
Exploratory Data Analysis in R
Hi there! tl;dr: Exploratory data analysis (EDA) the very first step in a data project. We will create a code-template to achieve this with one function. Introduction EDA consists of univariate (1-variable) and bivariate (2-variables) analysis. In this post we will review some functions that lead us to the... Read more
R Tip: Be Wary of “…”
The following code example contains an easy error in using the R function unique(). vec1 <- c("a", "b", "c") vec2 <- c("c", "d") unique(vec1, vec2) # "a" "b" "c" Notice none of the novel values from vec2 are present in the result. Our mistake was: we (improperly) tried to use unique() with multiple value arguments,... Read more
R Tip: Use isTRUE()
R Tip: use isTRUE(). A lot of R functions are type unstable, which means they return different types or classes depending on details of their values. For example consider all.equal(), it returns the logical value TRUEwhen the items being compared are equal: all.equal(1:3, c(1, 2, 3)) # TRUE However, when the items being compared... Read more
New Version of ggplot2
I just received a note from Hadley Wickham that a new version of ggplot2 is scheduled to be submitted to CRAN on June 25. Here’s what choroplethr users need to know about this new version of ggplot2. Choroplethr Update Required The new version of ggplot2 introduces bugs into choroplethr.... Read more
rqdatatable: rquery Powered by data.table
rquery is an R package for specifying data transforms using piped Codd-style operators. It has already shown great performance on PostgreSQL and Apache Spark. rqdatatable is a new package that supplies a screaming fast implementation of the rquery system in-memory using the data.table package. rquery is already one of the fastest and most teachable (due to deliberate conformity to Codd’s influential work) tools to wrangle data on databases and... Read more
WVPlots now at version 1.0.0 on CRAN!
Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. We are excited to announce the WVPlots is now at version 1.0.0 on CRAN! The idea is: we sacrifice some of... Read more
wrapr 1.4.1 now up on CRAN
wrapr 1.4.1 is now available on CRAN. wrapr is a really neat R package both organizing, meta-programming, and debugging R code. This update generalizes the dot-pipe feature’s dot S3 features. Please give it a try! wrapr, is an R package that supplies powerful tools for writing and debugging R code. Introduction Primary wrapr services include: let() (let block) %.>% (dot arrow pipe) build_frame()/draw_frame()... Read more
Exploratory Data Analysis and Data Preparation with ‘funModeling’
funModeling quick-start This package contains a set of functions related to exploratory data analysis, data preparation, and model performance. It is used by people coming from business, research, and teaching (professors and students). funModeling is intimately related to the Data Science Live Book -Open Source- (2017) in the sense that most of... Read more


RTools & Languagesposted by Thomas Lumley May 16, 2018

I’m working on an R package for mixed models under complex sampling.  It’s here. At the moment, it only tries to fit two-level linear mixed models to two-stage samples – for example, if you sample schools then students within schools, and want a model with school-level random effects. Also, it’s... Read more