Monthly Summary of Selected Trends, Activities and Insights for R – August 2018
Data for the trends and activities summarized here were obtained from popular websites used by the R community such as Google, GitHub, StackOverflow, Rstudio, METACRAN and R-Bloggers StackOverflow Number of StackOverflow Questions tagged R: 4,565 (8%  down from July) Number of Answers for R questions: 4,630 (3%  up from... Read more
Snakes in a Package: Combining Python and R with Reticulate
When I first started working as a data scientist (or something like it) I was told to program in C++ and Java. Then R came along and it was liberating; my ability to do data analysis increased substantially. As my applications grew in size and complexity, I started to... Read more
SQL Equivalents in R
Whenever I’m teaching introductory courses in data science using the R language, I often encounter students who use a different language like Python or Julia, and still others who are transitioning into data science from other fields and don’t know any data science language at all. The common thread... Read more
Monthly Summary of Selected Trends, Activities, and Insights for R – July 2018
R is a leading language in the data science domain. In the following article, a summary of selected trends, activities, and insights around the R language from July 2018 are presented. Data for the trends and activities summarized here were obtained from popular websites used by the R community such... Read more
Spark and The Art of Data Science
Apache Spark, or simply “Spark,” is a highly distributed, fault-tolerant, scalable framework that processes massive amounts of data. As it processes data, Spark abstracts the distribution of the data computations via a machine cluster thus enabling you to create applications using Java, Scala, Python, R, and SQL. Spark has... Read more
Quoting and Macros in R
Miles McBain has a nice post about quoting in R and the tidyeval procedure. In it, there’s this footnote In truth there are other types of calls, and the ones Lisp nuts really bang on about are macro calls In this post I want to talk about the similarities... Read more
Testing Probability Distribution Generators
In the ‘regression tests’ that are part of any change to the base-R source code, there’s a file called p-r-random-tests.R. People notice it from time to time because the tests sometimes fail. That’s what is supposed to happen. Testing random number generators is hard, because it’s hard to specify what... Read more
Exploratory Data Analysis in R
Hi there! tl;dr: Exploratory data analysis (EDA) the very first step in a data project. We will create a code-template to achieve this with one function. Introduction EDA consists of univariate (1-variable) and bivariate (2-variables) analysis. In this post we will review some functions that lead us to the... Read more
R Tip: Be Wary of “…”
The following code example contains an easy error in using the R function unique(). vec1 <- c("a", "b", "c") vec2 <- c("c", "d") unique(vec1, vec2) # "a" "b" "c" Notice none of the novel values from vec2 are present in the result. Our mistake was: we (improperly) tried to use unique() with multiple value arguments,... Read more
R Tip: Use isTRUE()
R Tip: use isTRUE(). A lot of R functions are type unstable, which means they return different types or classes depending on details of their values. For example consider all.equal(), it returns the logical value TRUEwhen the items being compared are equal: all.equal(1:3, c(1, 2, 3)) # TRUE However, when the items being compared... Read more