Start an argument with R

r
Author
Published

February 3, 2024

tl;dr

Four useful (lesser-known?) arguments to four common R functions.

Getting argumentative

There’s been a recent glut of posts about useful base-R functions, like the ones by Maëlle, Isabella and Yihui.

I’m not going to show you three useful base R functions. Instead, a twist. Four useful arguments from four everyday functions:

• `max.level` in `str()`
• `n` in `print()`
• `include.only` in `library()`
• `drop` in ``[``

Structural integrity

`str()` prints an object’s structure. It’s especially helpful for viewing lists in a compact hierarchical fashion. Consider this nested list:

``````nested_list <- list(
x = list(x1 = 1:3, x2 = list(x3 = 4:6, x4 = 7:9)),
y = list(y1 = list(y2 = list(y3 = mtcars))),
z = list(z1 = CO2, z2 = list(z4 = 100, z5 = chickwts), z3 = list(z5 = 1))
)``````

Here’s the output we get from a simple `str()` call:

``str(nested_list)``
``````List of 3
\$ x:List of 2
..\$ x1: int [1:3] 1 2 3
..\$ x2:List of 2
.. ..\$ x3: int [1:3] 4 5 6
.. ..\$ x4: int [1:3] 7 8 9
\$ y:List of 1
..\$ y1:List of 1
.. ..\$ y2:List of 1
.. .. ..\$ y3:'data.frame':    32 obs. of  11 variables:
.. .. .. ..\$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
.. .. .. ..\$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
.. .. .. ..\$ disp: num [1:32] 160 160 108 258 360 ...
.. .. .. ..\$ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
.. .. .. ..\$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
.. .. .. ..\$ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
.. .. .. ..\$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
.. .. .. ..\$ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
.. .. .. ..\$ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
.. .. .. ..\$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
.. .. .. ..\$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
\$ z:List of 3
..\$ z1:Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 84 obs. of  5 variables:
.. ..\$ Plant    : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
.. ..\$ Type     : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
.. ..\$ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
.. ..\$ conc     : num [1:84] 95 175 250 350 500 675 1000 95 175 250 ...
.. ..\$ uptake   : num [1:84] 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
.. ..- attr(*, "formula")=Class 'formula'  language uptake ~ conc | Plant
.. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
.. ..- attr(*, "outer")=Class 'formula'  language ~Treatment * Type
.. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
.. ..- attr(*, "labels")=List of 2
.. .. ..\$ x: chr "Ambient carbon dioxide concentration"
.. .. ..\$ y: chr "CO2 uptake rate"
.. ..- attr(*, "units")=List of 2
.. .. ..\$ x: chr "(uL/L)"
.. .. ..\$ y: chr "(umol/m^2 s)"
..\$ z2:List of 2
.. ..\$ z4: num 100
.. ..\$ z5:'data.frame':   71 obs. of  2 variables:
.. .. ..\$ weight: num [1:71] 179 160 136 227 217 168 108 124 143 140 ...
.. .. ..\$ feed  : Factor w/ 6 levels "casein","horsebean",..: 2 2 2 2 2 2 2 2 2 2 ...
..\$ z3:List of 1
.. ..\$ z5: num 1``````

Oof, that’s a little bit too much information to flood my console with.

Luckily you can use the `max.level` argument to restrict the depth to which the list is printed. Here’s the top layer only, which has a depth of 1:

``str(nested_list, max.level = 1)``
``````List of 3
\$ x:List of 2
\$ y:List of 1
\$ z:List of 3``````

Now I have a very high-level overview: this is list with three list objects with lengths 2, 1 and 3.

Let’s go deeper.

``str(nested_list, max.level = 2)``
``````List of 3
\$ x:List of 2
..\$ x1: int [1:3] 1 2 3
..\$ x2:List of 2
\$ y:List of 1
..\$ y1:List of 1
\$ z:List of 3
..\$ z1:Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 84 obs. of  5 variables:
.. ..- attr(*, "formula")=Class 'formula'  language uptake ~ conc | Plant
.. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
.. ..- attr(*, "outer")=Class 'formula'  language ~Treatment * Type
.. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
.. ..- attr(*, "labels")=List of 2
.. ..- attr(*, "units")=List of 2
..\$ z2:List of 2
..\$ z3:List of 1``````

Now we’ve unpacked the next layer of the onion and can see that the contained objects are made up of vectors and more yet more lists.

For me, this is a nice way to get the sense of structure without seeing the entire content. I think it beats the interactive list `View()` in RStudio as well, which can’t be opened to an arbitrary depth in one go.

Carriage feed

`print()` is a ubiquitous function across most programming languages. In R, you might just type an object’s name to show it. Here’s a tibble with 21 rows to demonstrate.

``````chick_tbl <- tibble::as_tibble(ChickWeight[1:21, ])
chick_tbl``````
``````# A tibble: 21 × 4
weight  Time Chick Diet
<dbl> <dbl> <ord> <fct>
1     42     0 1     1
2     51     2 1     1
3     59     4 1     1
4     64     6 1     1
5     76     8 1     1
6     93    10 1     1
7    106    12 1     1
8    125    14 1     1
9    149    16 1     1
10    171    18 1     1
# ℹ 11 more rows``````

You might use `head()` on a data.frame to prevent printing the whole thing, which defaults to showing 6 rows. Tibbles are truncated by default to 10, but a nice feature is that they’ll show a few more if there’s slightly more than 10 rows total. But what if you want more control?

Well, in both `print()` and `head()` is the `n` argument. No surprise: it lets you select how many rows of your data.frame or tibble get shown in the console.

I particularly like this when I have a tibble I’d like to inspect the entirety of, but it gets truncated by default. I’ll often find myself doing this:

``print(chick_tbl, n = Inf)``
``````# A tibble: 21 × 4
weight  Time Chick Diet
<dbl> <dbl> <ord> <fct>
1     42     0 1     1
2     51     2 1     1
3     59     4 1     1
4     64     6 1     1
5     76     8 1     1
6     93    10 1     1
7    106    12 1     1
8    125    14 1     1
9    149    16 1     1
10    171    18 1     1
11    199    20 1     1
12    205    21 1     1
13     40     0 2     1
14     49     2 2     1
15     58     4 2     1
16     72     6 2     1
17     84     8 2     1
18    103    10 2     1
19    122    12 2     1
20    138    14 2     1
21    162    16 2     1    ``````

You can set an `option()` to see more tibble rows by default, but I’m usually okay with its normal truncating behaviour.

Library check out

`library()` calls are a staple of R scripts. Let’s say I’m attaching the {lme4} package becuase I want to use the famous `cake` dataset1.

``library(lme4, quietly = TRUE)``

Aha, no, it’s not the `quietly` argument I want to talk about2, though it is handy for stopping messages from being printed.

Of course, what `library()` does is let you access objects, like functions and datasets, from a named package. How many objects did we attach from {lme4}?

``length(ls("package:lme4"))``
``[1] 102``

Blimey, all we wanted was `cake`. But actually, we can be more selective with `library()` using the `include.only` argument.

``````detach("package:lme4")
library(lme4, include.only = "cake")
ls("package:lme4")``````
``[1] "cake"``

Conversely, you can `exclude` as well.

Why would you want to do this? This can keep your environment tidy—if that’s something you care about—but also helps prevent conflicts between objects that might already exist in your environment.

It also means you’ve been explicit about the origin of any objects used in your script. If I see `cake` referenced in your script but can’t see how it was derived, I can take a look at that library call to see that you imported it from {lme4}.

At worst, this might be a nice thing for Python users, who love to `from x import y`.

Score a drop goal

The square bracket, ``[``, is a function3 for extracting elements out of objects, like rows and columns of a data.frame.

So the following will give you the first three rows of the `cake` data.frame for the columns `temp` and `angle`.

``cake[1:3, c("temp", "angle")]``
``````  temp angle
1  175    42
2  185    46
3  195    47``````

So what happens when you select a single column? You get one column back, right?

``cake[1:3, "temp"]``
``[1] 175 185 195``

Ha, lol, no. You get a vector. This might be a problem if you’re programmatically passing column names into ``[`` and you’re always expecting a data.frame as output.

Luckily, you can guard against this by ensuring the returned doesn’t `drop` to its simplest dimension.

``cake[1:3, "angle", drop = FALSE]``
``````  angle
1    42
2    46
3    47``````

A third argument inside the square brackets may look spooky to people who expect only to pass indices for `i` (rows) and `j` (columns), but it’s allowed!4

Coming to an agreement

These were unlikely to have blown your mind, especially if you’re a seasoned user. But I’ve live-coded with some folks who hadn’t seen them before.

At worst I hope you come away with a distinct sense ‘yeah, sure, I guess’.

Let me know if you want to argue your case for some other underappreciated arguments.

Environment

Session info
``Last rendered: 2024-02-03 16:53:13 GMT``
``````R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] lme4_1.1-35.1 Matrix_1.6-0

loaded via a namespace (and not attached):
[1] vctrs_0.6.5       nlme_3.1-162      cli_3.6.2         knitr_1.45
[5] rlang_1.1.3       xfun_0.41         minqa_1.2.6       jsonlite_1.8.7
[9] glue_1.7.0        htmltools_0.5.6.1 fansi_1.0.6       rmarkdown_2.25
[13] grid_4.3.1        evaluate_0.23     tibble_3.2.1      MASS_7.3-60
[17] fastmap_1.1.1     yaml_2.3.8        lifecycle_1.0.4   compiler_4.3.1
[21] Rcpp_1.0.11       htmlwidgets_1.6.2 pkgconfig_2.0.3   rstudioapi_0.15.0
[25] lattice_0.21-8    digest_0.6.33     nloptr_2.0.3      utf8_1.2.4
[29] pillar_1.9.0      splines_4.3.1     magrittr_2.0.3    tools_4.3.1
[33] boot_1.3-28.1    ``````

Footnotes

1. Rasmus recently did some sleuthing to discover the source of this dataset! A great read.↩︎

2. Note that package startup messages can also be controlled en masse by wrapping library calls in `suppressPackageStartupMessages()`, which I’ve talked about before. And also written about the sheer length of this function name.↩︎

3. Recall that ``[`` is actually a function so you can write ``[`(mtcars, 1:3, c("cyl", "hp"))` to achieve the same things as `mtcars[1:3, c("cyl", "hp"]`.↩︎

4. Of course, three arguments to `[`` is bread and butter for {data.table} users!↩︎

CC BY-NC-SA 4.0