oshka - Recursive Quoted Language Expansion

Brodie Gaslam

Programmable Non-Standard Evaluation

Non-Standard Evaluation (NSE hereafter) occurs when R expressions are captured and evaluated in a manner different than if they had been executed without intervention. subset is a canonical example, which we use here with the built-in iris data set:

subset(iris, Sepal.Width > 4.1)
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
16          5.7         4.4          1.5         0.4  setosa
34          5.5         4.2          1.4         0.2  setosa

Sepal.Width does not exist in the global environment, yet this works because subset captures the expression and evaluates it within iris.

A limitation of NSE is that it is difficult to use programmatically:

exp.a <- quote(Sepal.Width > 4.1)
subset(iris, exp.a)
Error in subset.data.frame(iris, exp.a): 'subset' must be logical

oshka::expand facilitates programmable NSE, as with this simplified version of subset:

subset2 <- function(x, subset) {
  sub.exp <- expand(substitute(subset), x, parent.frame())
  sub.val <- eval(sub.exp, x, parent.frame())
  x[!is.na(sub.val) & sub.val, ]
subset2(iris, exp.a)
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
16          5.7         4.4          1.5         0.4  setosa
34          5.5         4.2          1.4         0.2  setosa

expand is recursive:

exp.b <- quote(Species == 'virginica')
exp.c <- quote(Sepal.Width > 3.6)
exp.d <- quote(exp.b & exp.c)

subset2(iris, exp.d)
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
118          7.7         3.8          6.7         2.2 virginica
132          7.9         3.8          6.4         2.0 virginica

We abide by R semantics so that programmable NSE functions are almost identical to normal NSE functions, with programmability as a bonus.

Forwarding NSE Arguments to NSE Functions

If you wish to write a function that uses a programmable NSE function and forwards its NSE arguments to it, you must ensure the NSE expressions are evaluated in the correct environment, typically the parent.frame(). This is no different than with normal NSE functions. An example:

subset3 <- function(x, subset, select, drop=FALSE) {
  frm <- parent.frame()  # as per note in ?parent.frame, better to call here
  sub.q <- expand(substitute(subset), x, frm)
  sel.q <- expand(substitute(select), x, frm)
  eval(bquote(base::subset(.(x), .(sub.q), .(sel.q), drop=.(drop))), frm)

We use bquote to assemble our substituted call and eval to evaluate it in the correct frame. The parts of the call that should evaluate in subset3 are escaped with .(). This requires some work from the programmer, but the user reaps the benefits:

col <- quote(Sepal.Length)
sub <- quote(Species == 'setosa')

subset3(iris, sub & col > 5.5, col:Petal.Length)
   Sepal.Length Sepal.Width Petal.Length
15          5.8         4.0          1.2
16          5.7         4.4          1.5
19          5.7         3.8          1.7

Notice that we used expand with the base NSE function subset. Because expand just generates language objects, you can use it with any NSE function.

The forwarding is robust to unusual evaluation:

col.a <- quote(I_dont_exist)
col.b <- quote(Sepal.Length)
sub.a <- quote(stop("all hell broke loose"))
threshold <- 3.35

  col.a <- quote(Sepal.Width)
  sub.a <- quote(Species == 'virginica')
  subs <- list(sub.a, quote(Species == 'versicolor'))

    function(x) subset3(iris, x & col.a > threshold, col.b:Petal.Length)
    Sepal.Length Sepal.Width Petal.Length
110          7.2         3.6          6.1
118          7.7         3.8          6.7
132          7.9         3.8          6.4
137          6.3         3.4          5.6
149          6.2         3.4          5.4

   Sepal.Length Sepal.Width Petal.Length
86            6         3.4          4.5

Other Considerations

One drawback of the eval/bquote/.() pattern is that the actual objects inside .() are placed on the call stack. This is not an issue with symbols, but can be bothersome with data or functions. For example, in:

my_fun_inner <- function(x) {
  # ... bunch of code
my_fun_outer <- function(x) {
  eval(bquote(.(my_fun)(.(x))), parent.frame())

The entire deparsed function definition and data frame will be displayed in the traceback, which makes it difficult to see what is happening. A simple work-around is to use:

sapply(.traceback(), head, 1)
sapply(sys.calls(), head, 1)  # sys.calls is similarly affected

Versus rlang

oshka is simple in design and purpose. It exports a single function that substitutes expressions into other expressions. It hews closely to R semantics. rlang is more ambitious and more complex as a result. To use it you must learn new concepts and semantics.

One manifestation of the additional complexity in rlang is that you must unquote expressions to use them:

rlang.b <- quo(Species == 'virginica')
rlang.c <- quo(Sepal.Width > 3.6)
rlang.d <- quo(!!rlang.b & !!rlang.c)

dplyr::filter(iris, !!rlang.d)

As shown earlier, the expand version is more straightforward as it uses the standard quote function and does not require unquoting:

exp.b <- quote(Species == 'virginica')
exp.c <- quote(Sepal.Width > 3.6)
exp.d <- quote(exp.b & exp.c)

subset2(iris, exp.d)

On the other hand, forwarding of NSE arguments to NSE functions is simpler in rlang due to environment capture feature of quosures:

rlang_virginica <- function(subset) {
  subset <- enquo(subset)
  dplyr::filter(iris, Species == 'virginica' & !!subset)

Because oshka does not capture environments, we must resort to the eval/bquote pattern:

oshka_virginica <- function(subset) {
  subset <- bquote(Species == 'virginica' & .(substitute(subset)))
  eval(bquote(.(subset2)(iris, .(subset))), parent.frame())

oshka minimizes the complexity in what we see as the most common use case, and sticks to R semantics for the more complicated ones.

For additional discussion on rlang see the following presentations: