rostrum.blog - Start an argument with R

tl;dr

Four useful (lesser-known?) arguments to four common R functions.

Getting argumentative

There’s been a recent glut of posts about useful base-R functions, like the ones by Maëlle, Isabella and Yihui.

I’m not going to show you three useful base R functions. Instead, a twist. Four useful arguments from four everyday functions:

max.level in str()
n in print()
include.only in library()
drop in `[`

Structural integrity

str() prints an object’s structure. It’s especially helpful for viewing lists in a compact hierarchical fashion. Consider this nested list:

nested_list <- list(
  x = list(x1 = 1:3, x2 = list(x3 = 4:6, x4 = 7:9)),
  y = list(y1 = list(y2 = list(y3 = mtcars))),
  z = list(z1 = CO2, z2 = list(z4 = 100, z5 = chickwts), z3 = list(z5 = 1))
)

Here’s the output we get from a simple str() call:

str(nested_list)

List of 3
 $ x:List of 2
  ..$ x1: int [1:3] 1 2 3
  ..$ x2:List of 2
  .. ..$ x3: int [1:3] 4 5 6
  .. ..$ x4: int [1:3] 7 8 9
 $ y:List of 1
  ..$ y1:List of 1
  .. ..$ y2:List of 1
  .. .. ..$ y3:'data.frame':    32 obs. of  11 variables:
  .. .. .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
  .. .. .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
  .. .. .. ..$ disp: num [1:32] 160 160 108 258 360 ...
  .. .. .. ..$ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
  .. .. .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
  .. .. .. ..$ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
  .. .. .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
  .. .. .. ..$ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
  .. .. .. ..$ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
  .. .. .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
  .. .. .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
 $ z:List of 3
  ..$ z1:Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 84 obs. of  5 variables:
  .. ..$ Plant    : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
  .. ..$ Type     : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ conc     : num [1:84] 95 175 250 350 500 675 1000 95 175 250 ...
  .. ..$ uptake   : num [1:84] 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
  .. ..- attr(*, "formula")=Class 'formula'  language uptake ~ conc | Plant
  .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
  .. ..- attr(*, "outer")=Class 'formula'  language ~Treatment * Type
  .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
  .. ..- attr(*, "labels")=List of 2
  .. .. ..$ x: chr "Ambient carbon dioxide concentration"
  .. .. ..$ y: chr "CO2 uptake rate"
  .. ..- attr(*, "units")=List of 2
  .. .. ..$ x: chr "(uL/L)"
  .. .. ..$ y: chr "(umol/m^2 s)"
  ..$ z2:List of 2
  .. ..$ z4: num 100
  .. ..$ z5:'data.frame':   71 obs. of  2 variables:
  .. .. ..$ weight: num [1:71] 179 160 136 227 217 168 108 124 143 140 ...
  .. .. ..$ feed  : Factor w/ 6 levels "casein","horsebean",..: 2 2 2 2 2 2 2 2 2 2 ...
  ..$ z3:List of 1
  .. ..$ z5: num 1

Oof, that’s a little bit too much information to flood my console with.

Luckily you can use the max.level argument to restrict the depth to which the list is printed. Here’s the top layer only, which has a depth of 1:

str(nested_list, max.level = 1)

List of 3
 $ x:List of 2
 $ y:List of 1
 $ z:List of 3

Now I have a very high-level overview: this is list with three list objects with lengths 2, 1 and 3.

Let’s go deeper.

str(nested_list, max.level = 2)

List of 3
 $ x:List of 2
  ..$ x1: int [1:3] 1 2 3
  ..$ x2:List of 2
 $ y:List of 1
  ..$ y1:List of 1
 $ z:List of 3
  ..$ z1:Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 84 obs. of  5 variables:
  .. ..- attr(*, "formula")=Class 'formula'  language uptake ~ conc | Plant
  .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
  .. ..- attr(*, "outer")=Class 'formula'  language ~Treatment * Type
  .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
  .. ..- attr(*, "labels")=List of 2
  .. ..- attr(*, "units")=List of 2
  ..$ z2:List of 2
  ..$ z3:List of 1

Now we’ve unpacked the next layer of the onion and can see that the contained objects are made up of vectors and more yet more lists.

For me, this is a nice way to get the sense of structure without seeing the entire content. I think it beats the interactive list View() in RStudio as well, which can’t be opened to an arbitrary depth in one go.

Carriage feed

print() is a ubiquitous function across most programming languages. In R, you might just type an object’s name to show it. Here’s a tibble with 21 rows to demonstrate.

chick_tbl <- tibble::as_tibble(ChickWeight[1:21, ])
chick_tbl

# A tibble: 21 × 4
   weight  Time Chick Diet 
    <dbl> <dbl> <ord> <fct>
 1     42     0 1     1    
 2     51     2 1     1    
 3     59     4 1     1    
 4     64     6 1     1    
 5     76     8 1     1    
 6     93    10 1     1    
 7    106    12 1     1    
 8    125    14 1     1    
 9    149    16 1     1    
10    171    18 1     1    
# ℹ 11 more rows

You might use head() on a data.frame to prevent printing the whole thing, which defaults to showing 6 rows. Tibbles are truncated by default to 10, but a nice feature is that they’ll show a few more if there’s slightly more than 10 rows total. But what if you want more control?

Well, in both print() and head() is the n argument. No surprise: it lets you select how many rows of your data.frame or tibble get shown in the console.

I particularly like this when I have a tibble I’d like to inspect the entirety of, but it gets truncated by default. I’ll often find myself doing this:

print(chick_tbl, n = Inf)

# A tibble: 21 × 4
   weight  Time Chick Diet 
    <dbl> <dbl> <ord> <fct>
 1     42     0 1     1    
 2     51     2 1     1    
 3     59     4 1     1    
 4     64     6 1     1    
 5     76     8 1     1    
 6     93    10 1     1    
 7    106    12 1     1    
 8    125    14 1     1    
 9    149    16 1     1    
10    171    18 1     1    
11    199    20 1     1    
12    205    21 1     1    
13     40     0 2     1    
14     49     2 2     1    
15     58     4 2     1    
16     72     6 2     1    
17     84     8 2     1    
18    103    10 2     1    
19    122    12 2     1    
20    138    14 2     1    
21    162    16 2     1

You can set an option() to see more tibble rows by default, but I’m usually okay with its normal truncating behaviour.

Library check out

library() calls are a staple of R scripts. Let’s say I’m attaching the {lme4} package becuase I want to use the famous cake dataset¹.

library(lme4, quietly = TRUE)

Aha, no, it’s not the quietly argument I want to talk about², though it is handy for stopping messages from being printed.

Of course, what library() does is let you access objects, like functions and datasets, from a named package. How many objects did we attach from {lme4}?

length(ls("package:lme4"))

[1] 102

Blimey, all we wanted was cake. But actually, we can be more selective with library() using the include.only argument.

detach("package:lme4")
library(lme4, include.only = "cake")
ls("package:lme4")

[1] "cake"

Conversely, you can exclude as well.

Why would you want to do this? This can keep your environment tidy—if that’s something you care about—but also helps prevent conflicts between objects that might already exist in your environment.

It also means you’ve been explicit about the origin of any objects used in your script. If I see cake referenced in your script but can’t see how it was derived, I can take a look at that library call to see that you imported it from {lme4}.

At worst, this might be a nice thing for Python users, who love to from x import y.

Score a drop goal

The square bracket, `[`, is a function³ for extracting elements out of objects, like rows and columns of a data.frame.

So the following will give you the first three rows of the cake data.frame for the columns temp and angle.

cake[1:3, c("temp", "angle")]

  temp angle
1  175    42
2  185    46
3  195    47

So what happens when you select a single column? You get one column back, right?

cake[1:3, "temp"]

[1] 175 185 195

Ha, lol, no. You get a vector. This might be a problem if you’re programmatically passing column names into `[` and you’re always expecting a data.frame as output.

Luckily, you can guard against this by ensuring the returned doesn’t drop to its simplest dimension.

cake[1:3, "angle", drop = FALSE]

A third argument inside the square brackets may look spooky to people who expect only to pass indices for i (rows) and j (columns), but it’s allowed!⁴

Coming to an agreement

These were unlikely to have blown your mind, especially if you’re a seasoned user. But I’ve live-coded with some folks who hadn’t seen them before.

At worst I hope you come away with a distinct sense ‘yeah, sure, I guess’.

Let me know if you want to argue your case for some other underappreciated arguments.

Environment

Session info

Last rendered: 2024-02-03 16:53:13 GMT

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lme4_1.1-35.1 Matrix_1.6-0 

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       nlme_3.1-162      cli_3.6.2         knitr_1.45       
 [5] rlang_1.1.3       xfun_0.41         minqa_1.2.6       jsonlite_1.8.7   
 [9] glue_1.7.0        htmltools_0.5.6.1 fansi_1.0.6       rmarkdown_2.25   
[13] grid_4.3.1        evaluate_0.23     tibble_3.2.1      MASS_7.3-60      
[17] fastmap_1.1.1     yaml_2.3.8        lifecycle_1.0.4   compiler_4.3.1   
[21] Rcpp_1.0.11       htmlwidgets_1.6.2 pkgconfig_2.0.3   rstudioapi_0.15.0
[25] lattice_0.21-8    digest_0.6.33     nloptr_2.0.3      utf8_1.2.4       
[29] pillar_1.9.0      splines_4.3.1     magrittr_2.0.3    tools_4.3.1      
[33] boot_1.3-28.1

Footnotes

Rasmus recently did some sleuthing to discover the source of this dataset! A great read.↩︎
Note that package startup messages can also be controlled en masse by wrapping library calls in suppressPackageStartupMessages(), which I’ve talked about before. And also written about the sheer length of this function name.↩︎
Recall that `[` is actually a function so you can write `[`(mtcars, 1:3, c("cyl", "hp")) to achieve the same things as mtcars[1:3, c("cyl", "hp"].↩︎
Of course, three arguments to [` is bread and butter for {data.table} users!↩︎

Reuse

CC BY-NC-SA 4.0