dplyr programming

R
tidyverse
Published

June 30, 2023

I wanted to use some dynamic variable names in dplyr and had to look up dplyr programming again.

This is something I have to look up regularly and where I am still not comfortable with the terminology. Here, I find the respective tidyverse documentation rather challenging. However, dplyr programming is also something that I use rarely.

So here is a brief summary of my recent exploration of the topic.

library(tidyverse)

cars <- as_tibble(datasets::mtcars)
var_select <- "cyl"

Pronoun .data

Use the .data pronoun to access a variable name in a character object in a dplyr workflow.

cars |> count(.data[[var_select]])
cyl n
4 11
6 7
8 14
cars |> count(.data[["cyl"]])
cyl n
4 11
6 7
8 14

Embrace argument

A variable used as a function argument can be embraced with double braces to use it in a dplyr workflow.

count_var <- function(.dt, .var) {
  summarise(.dt, n = n(), .by = all_of({{ .var }}))
}

count_var(cars, var_select)
cyl n
6 7
4 11
8 14
count_var(cars, c(var_select, "vs"))
cyl vs n
6 0 3
4 1 10
6 1 4
8 0 14
4 0 1

Name injection

Name injection can be used to assign new variable names in a dplyr workflow.

:= is used as the assignment operator with dynamic new variable names.

cars |>
  summarise("{var_select}_count" := n(),
    .by = all_of({{ var_select }})
  )
cyl cyl_count
6 7
4 11
8 14
count_var2 <- function(dt, var) {
  summarise(dt, "{var}_count" := n(),
    .by = all_of({{ var }})
  )
}

count_var2(cars, "cyl")
cyl cyl_count
6 7
4 11
8 14
count_var3 <- function(dt, var) {
  summarise(dt, "{{var}}_count" := n(),
    .by = all_of({{ var }})
  )
}

count_var3(cars, var_select)
cyl var_select_count
6 7
4 11
8 14
summarise_dt <- function(data, var) {
  data |>
    summarise(
      "mean_{{var}}" := mean({{ var }}),
      "sum_{{var}}" := sum({{ var }}),
      "n_{{var}}" := n()
    )
}

summarise_dt(cars, cyl)
mean_cyl sum_cyl n_cyl
6.1875 198 32