Spark and The Art of Data Science
RToolsTools & LanguagesApache Sparkposted by Diego Arenas, ODSC August 21, 2018
Apache Spark, or simply “Spark,” is a highly distributed, fault-tolerant, scalable framework that processes massive amounts of data. As it processes data, Spark abstracts the distribution of the data computations via a machine cluster thus enabling you to create applications using Java, Scala, Python, R, and... Read more
Quoting and Macros in R
RTools & Languagesposted by Thomas Lumley August 14, 2018
Miles McBain has a nice post about quoting in R and the tidyeval procedure. In it, there’s this footnote In truth there are other types of calls, and the ones Lisp nuts really bang on about are macro calls In this post I want to talk... Read more
Testing Probability Distribution Generators
RTools & Languagesposted by Thomas Lumley August 7, 2018
In the ‘regression tests’ that are part of any change to the base-R source code, there’s a file called p-r-random-tests.R. People notice it from time to time because the tests sometimes fail. That’s what is supposed to happen. Testing random number generators is hard, because it’s hard... Read more
Exploratory Data Analysis in R
RTools & Languagesposted by Pablo Casas August 3, 2018
Hi there! tl;dr: Exploratory data analysis (EDA) the very first step in a data project. We will create a code-template to achieve this with one function. Introduction EDA consists of univariate (1-variable) and bivariate (2-variables) analysis. In this post we will review some functions that lead... Read more
R Tip: Be Wary of “…”
RTools & Languagesposted by John Mount July 18, 2018
The following code example contains an easy error in using the R function unique(). vec1 <- c("a", "b", "c") vec2 <- c("c", "d") unique(vec1, vec2) # "a" "b" "c" Notice none of the novel values from vec2 are present in the result. Our mistake was: we (improperly) tried to use unique() with... Read more
R Tip: Use isTRUE()
RTools & Languagesposted by John Mount June 29, 2018
R Tip: use isTRUE(). A lot of R functions are type unstable, which means they return different types or classes depending on details of their values. For example consider all.equal(), it returns the logical value TRUEwhen the items being compared are equal: all.equal(1:3, c(1, 2, 3)) # TRUE However, when the... Read more
New Version of ggplot2
RTools & Languagesposted by Ari Lamstein June 20, 2018
I just received a note from Hadley Wickham that a new version of ggplot2 is scheduled to be submitted to CRAN on June 25. Here’s what choroplethr users need to know about this new version of ggplot2. Choroplethr Update Required The new version of ggplot2 introduces... Read more
rqdatatable: rquery Powered by data.table
RTools & Languagesposted by John Mount June 14, 2018
rquery is an R package for specifying data transforms using piped Codd-style operators. It has already shown great performance on PostgreSQL and Apache Spark. rqdatatable is a new package that supplies a screaming fast implementation of the rquery system in-memory using the data.table package. rquery is already one of the fastest and most teachable (due to deliberate conformity to Codd’s influential work) tools to wrangle data... Read more
WVPlots now at version 1.0.0 on CRAN!
RTools & Languagesposted by John Mount June 8, 2018
Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. We are excited to announce the WVPlots is now at version 1.0.0 on CRAN! The idea is: we... Read more
wrapr 1.4.1 now up on CRAN
RTools & Languagesposted by John Mount May 31, 2018
wrapr 1.4.1 is now available on CRAN. wrapr is a really neat R package both organizing, meta-programming, and debugging R code. This update generalizes the dot-pipe feature’s dot S3 features. Please give it a try! wrapr, is an R package that supplies powerful tools for writing and debugging R code. Introduction Primary wrapr services include: let() (let block) %.>% (dot... Read more