Exploring the Nuances of R's Labeling Functions and the Tidyverse

In a recent blog post, I found myself reflecting on how I approach reading code across different programming languages, particularly R, and ended up contemplating a feature that I wish R included but currently does not. This piece serves as a thought exercise, examining some code without intending to criticize any design choices made by the tidyverse team. It's clear that they have written far more complex code than I have and have thoroughly considered their design approach. However, I believe it's beneficial to challenge our own assumptions and analyze our reactions to the code we encounter.
The blog post detailing the latest updates to the tidyverse's {scales}
package effectively illustrates the new functionalities available, but one feature, in particular, caught my attention. The label_glue
function, which utilizes a {glue}
string as an argument, produces a labelling function. This function is then passed a vector of penguin species, which gets integrated into the {glue}
string, resulting in the following output:
label_glue("The {x} penguin")(c("Gentoo", "Chinstrap", "Adelie"))
# The Gentoo penguin
# The Chinstrap penguin
# The Adelie penguin
In this context, the label_glue
function serves a purpose similar to that of f-strings in Python. For those who may be more familiar with Python, the use of {glue}
in R mirrors the functionality of f-strings, allowing for straightforward string interpolation:
## R: name <- "Jonathan"
glue::glue("My name is {name}") # My name is Jonathan## Python:
>>>> name = 'Jonathan'
>> f"My name is {name}" # 'My name is Jonathan'
While the label_glue()()
call may not seem particularly magicalfunctions are simply being applied to their argumentsits always insightful to dig deeper when faced with unexpected behaviors in code.
To clarify, lets consider a simplified version of what label_glue
does. A basic implementation may resemble this:
tmp_label_glue <- function(pattern = "{x}") {
function(x) {
glue::glue_data(list(x = x), pattern)
}
}
This function returns another function that takes one argument. By evaluating it, we can observe the following:
tmp_label_glue("The {x} penguin") # function(x) {
# glue::glue_data(list(x = x), pattern)
# } #
This approach has the advantage of allowing us to store the result as a named function. For instance:
penguin_label <- tmp_label_glue("The {x} penguin")
penguin_label # function(x) {
# glue::glue_data(list(x = x), pattern)
# } # #
The penguin_label
function can then be utilized like this:
penguin_label(c("Gentoo", "Chinstrap", "Adelie")) # The Gentoo penguin # The Chinstrap penguin # The Adelie penguin
This flexibility is advantageous because different {glue}
strings can generate different functions. However, if you only plan to work with a single pattern, the idea of using an inline call without naming it can feel unconventional, as illustrated:
label_glue("The {x} penguin")(c("Gentoo", "Chinstrap", "Adelie"))
One might wonder why we cannot consolidate all of these arguments into a single function call:
label_glue("The {x} penguin", c("Gentoo", "Chinstrap", "Adelie"))
However, that's unfeasible since label_glue
does not accept labels as an argument, nor does it return a function in the traditional sense. The design of {scales}
functions that accept functions as arguments suggests a deliberate choice to enable lazy evaluation. This allows for the dynamic passing of values during the plotting process, rather than requiring manual extraction and computation of labels.
The mechanics of what gets passed to the generated function can vary depending on its application. For example, if used in a scale_y_discrete
context, it may appear as follows:
library(ggplot2)
library(palmerpenguins)
p <- ggplot(penguins[complete.cases(penguins), ]) + aes(bill_length_mm, species) + geom_point()
p + scale_y_discrete(labels = penguin_label)
This demonstrates that the labels argument indeed accepts a function, thereby leveraging the penguin_label
function defined earlier. Alternatively, you could simplify this to:
p + scale_y_discrete(labels = label_glue("The {x} penguin"))
What ultimately gets passed to that generated function can be tricky to identify directly from the source code. However, one can reasonably anticipate that the supplied function will eventually receive the applicable labels as an argument, depending on the context in which it's employed.
I cant help but feel that my understanding of this function's external usage:
label_glue("The {x} penguin")(c("Gentoo", "Chinstrap", "Adelie"))
conflicts with my more recent comprehension of Haskell and its handling of partial application. In Haskell, all functions are inherently designed to accept a single argument, even if they appear to accept multiple. For instance, the following function:
do_thing x y z = x + y + z
gives the impression of taking three arguments, yet it functions more like:
do_thing <- function(x) { function(y) { function(z) { x + y + z } } }
do_thing(2)(3)(4) # [1] 9
Whats crucial here is that we can peel off some of the layers to return a function that accommodates the remaining arguments.
This Haskell paradigm is mirrored in the R environment, yet the immediate perception of R code may lead one to misinterpret the implications of parentheses. For instance:
calc_mean <- function() { function(x) { mean(x) } }
sapply(mtcars[, 1:4], calc_mean())
Although it may read awkwardly, it illustrates that calc_mean()
effectively returns a mean function.
In light of this, I personally prefer to name labeling functions explicitly, as it makes the code clearer:
p + scale_y_discrete(labels = penguin_label)
This way, the parentheses apply correctly where the function is supposed to be invoked, enhancing code readability.
Ultimately, my reflection is not a critique of the {scales} design; rather, it's an exploration of my own perceptions regarding the arg = fun()
model. Its a model that makes sense, but its appearance can be misleading. Am I alone in feeling this way?
Feel free to share your thoughts with me on Mastodon or in the comments section below.
devtools::session_info()