

R Tip: Break up Function Nesting for Legibility
RTools & LanguagesProgramming Language|Rposted by John Mount March 30, 2018 John Mount

There are a number of easy ways to avoid illegible code nesting problems in R
.
In this R tip we will expand upon the above statement with a simple example.
At some point it becomes illegible and undesirable to compose operations by nesting them, such as in the following code.
head(mtcars[with(mtcars, cyl == 8), c("mpg", "cyl", "wt")]) # mpg cyl wt # Hornet Sportabout 18.7 8 3.44 # Duster 360 14.3 8 3.57 # Merc 450SE 16.4 8 4.07 # Merc 450SL 17.3 8 3.73 # Merc 450SLC 15.2 8 3.78 # Cadillac Fleetwood 10.4 8 5.25
One popular way to break up nesting is to use magrittr
‘s “%>%
” in combination with dplyr
transform verbs as we show below.
library("dplyr") mtcars %>% filter(cyl == 8) %>% select(mpg, cyl, wt) %>% head # mpg cyl wt # 1 18.7 8 3.44 # 2 14.3 8 3.57 # 3 16.4 8 4.07 # 4 17.3 8 3.73 # 5 15.2 8 3.78 # 6 10.4 8 5.25
Note: the above code lost (without warning) the row names that are part of mtcars
. We also pass over the details of how pipe notation works. It is sufficient to say the notational convention is: each stage is approximately treated as an altered function call with a new inserted first argument set to the value of the pipeline up to the current point.
Many R
users already routinely avoid nested notation problems through a convention I call “name re-use.” Such code looks like the following.
result <- mtcars result <- filter(result, cyl == 8) result <- select(result, mpg, cyl, wt) head(result)
The above convention is enough to get around all problems of nesting. It also has the great advantage that it is step-debuggable. I recommend introducing and re-using a result name (in this case “result
“), and not re-using the starting data name (in this case “mtcars
“). This extra care makes the entire block restartable which is another benefit when developing and debugging.
I like a variation I call “dot intermediates”, which looks like the code below (notice we are switching back from dplyr
verbs, to base R
operators).
. <- mtcars . <- subset(., cyl == 8) . <- .[, c("mpg", "cyl", "wt")] result <- . head(result) # mpg cyl wt # Hornet Sportabout 18.7 8 3.44 # Duster 360 14.3 8 3.57 # Merc 450SE 16.4 8 4.07 # Merc 450SL 17.3 8 3.73 # Merc 450SLC 15.2 8 3.78 # Cadillac Fleetwood 10.4 8 5.25
The dot intermediate convention is very succinct, and we can use it with base R
transforms to get a correct (and performant) result. The dot intermediates convention is particularly neat when you don’t intend to take the result further into your calculation (such as when you only want to print it) as it does not require you to think up an evocative result name. Like all conventions: it is just a matter of teaching, learning, and repetition to make this seem natural, familiar and legible.
Also, contrary to what many repeat, base R
is often faster than the dplyr
alternative.
library("dplyr") library("microbenchmark") library("ggplot2") timings <- microbenchmark( base = { . <- mtcars . <- subset(., cyl == 8) . <- .[, c("mpg", "cyl", "wt")] nrow(.) }, dplyr = { mtcars %>% filter(cyl == 8) %>% select(mpg, cyl, wt) %>% nrow }) print(timings) ## Unit: microseconds ## expr min lq mean median uq max neval ## base 122.948 136.948 167.2253 159.688 179.924 349.328 100 ## dplyr 1570.188 1654.700 2537.2912 1699.744 1785.611 50759.770 100 autoplot(timings)

R
is 15 times faster (possibly due to magrittr
overhead and the small size of this example). We also see, with some care, base R
can be quite legible. dplyr
is a useful tool and convention, however it is not the only allowed tool or only allowed convention.
Original Source