## Spark and The Art of Data Science

RToolsTools & LanguagesApache Sparkposted by Diego Arenas, ODSC August 21, 2018

Apache Spark, or simply “Spark,” is a highly distributed, fault-tolerant, scalable framework that processes massive amounts of data. As it processes data, Spark abstracts the distribution of the data computations via a machine cluster thus enabling you to create applications using Java, Scala, Python, R, and... Read more

## Quoting and Macros in R

RTools & Languagesposted by Thomas Lumley August 14, 2018

Miles McBain has a nice post about quoting in R and the tidyeval procedure. In it, there’s this footnote In truth there are other types of calls, and the ones Lisp nuts really bang on about are macro calls In this post I want to talk... Read more

## Testing Probability Distribution Generators

RTools & Languagesposted by Thomas Lumley August 7, 2018

In the ‘regression tests’ that are part of any change to the base-R source code, there’s a file called p-r-random-tests.R. People notice it from time to time because the tests sometimes fail. That’s what is supposed to happen. Testing random number generators is hard, because it’s hard... Read more

## Exploratory Data Analysis in R

RTools & Languagesposted by Pablo Casas August 3, 2018

Hi there! tl;dr: Exploratory data analysis (EDA) the very first step in a data project. We will create a code-template to achieve this with one function. Introduction EDA consists of univariate (1-variable) and bivariate (2-variables) analysis. In this post we will review some functions that lead... Read more

## R Tip: Be Wary of “…”

RTools & Languagesposted by John Mount July 18, 2018

The following code example contains an easy error in using the R function unique(). vec1 <- c("a", "b", "c") vec2 <- c("c", "d") unique(vec1, vec2) # "a" "b" "c" Notice none of the novel values from vec2 are present in the result. Our mistake was: we (improperly) tried to use unique() with... Read more

## R Tip: Use isTRUE()

RTools & Languagesposted by John Mount June 29, 2018

R Tip: use isTRUE(). A lot of R functions are type unstable, which means they return different types or classes depending on details of their values. For example consider all.equal(), it returns the logical value TRUEwhen the items being compared are equal: all.equal(1:3, c(1, 2, 3)) # TRUE However, when the... Read more

## New Version of ggplot2

RTools & Languagesposted by Ari Lamstein June 20, 2018

I just received a note from Hadley Wickham that a new version of ggplot2 is scheduled to be submitted to CRAN on June 25. Here’s what choroplethr users need to know about this new version of ggplot2. Choroplethr Update Required The new version of ggplot2 introduces... Read more

RTools & Languagesposted by John Mount June 14, 2018

rquery is an R package for specifying data transforms using piped Codd-style operators. It has already shown great performance on PostgreSQL and Apache Spark. rqdatatable is a new package that supplies a screaming fast implementation of the rquery system in-memory using the data.table package. rquery is already one of the fastest and most teachable (due to deliberate conformity to Codd’s influential work) tools to wrangle data... Read more

## WVPlots now at version 1.0.0 on CRAN!

RTools & Languagesposted by John Mount June 8, 2018

Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. We are excited to announce the WVPlots is now at version 1.0.0 on CRAN! The idea is: we... Read more

## wrapr 1.4.1 now up on CRAN

RTools & Languagesposted by John Mount May 31, 2018

wrapr 1.4.1 is now available on CRAN. wrapr is a really neat R package both organizing, meta-programming, and debugging R code. This update generalizes the dot-pipe feature’s dot S3 features. Please give it a try! wrapr, is an R package that supplies powerful tools for writing and debugging R code. Introduction Primary wrapr services include: let() (let block) %.>% (dot... Read more