Exploring R package startup messages

cran
gh
httr
r
tidyverse
Author
Published

August 27, 2021

A screenshot of the R console showing the output from when the multicolor package is attached. It's cursive ASCII-art text of the package name, coloured with random colours.

tl;dr

I got curious about R package startup messages, so I grabbed all the special zzz.R files from R packages that are on CRAN and sourced on GitHub. You can jump to the table of results.

Start me up

I learnt recently from Hernando Cortina that his and Amanda Dobbyn’s {multicolor} package prints to the console some multicoloured ASCII-art text of the package’s name when you call it with library(multicolor).

It gave me an itch to scratch: how often are these sorts of startup messages used by R packages? What do people put in them? Is there anything funny in them? Anything nefarious?

A strong attachment

A package may need to run additional code before its functions can work, like maybe some options() need to be set.

There are two times this kind of code can be run: when the package is loaded, including namespace calls like dplyr::select(), or more specifically when the package is attached with library().

To prepare code for running on-load or on-attach, you create the special functions .onLoad() and .onAttach(). These go in a zzz.R file in the R/ directory of your package, because… convention?

The on-attach option is useful for printing messages for the user to see in the console, like the {multicolor} example above. You want this to happen on-attach and not on-load, since you wouldn’t want to print a message every single your script uses the :: namespace qualifier.

To specify a message in the body of your .onAttach() function, you use packageStartupMessage(). Why not just cat() or message()? Because it allows the user to quell startup messages using suppressPackageStartupMessages().

You can learn more in Hadley Wickham’s R Packages book.

As an example, consider the {tidyverse} package, which has some verbose output on attach:

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

But you can shush it with the suppressPackageStartupMessages() function:1

detach("package:tidyverse")  # first detach it
suppressPackageStartupMessages(library(tidyverse))

Peace.

So the startup messages of {multicolor} and {tidyverse} do two completely different things: one is fun and frivolous and the other is informative. Isn’t it possible that someone could put ads in the startup message or use it in evil ways? Well, perhaps.

Let’s find out what R package developers put in their startup messages. How many packages even have a zzz.R file and how many of those even contain a packageStartupMessage() call?

Catching some Zs

I understand if all this talk of zzz.R causes you to… zzz. In short, if you want to get all the zzz.R files, you can:

  1. Get a list of R packages on CRAN
  2. Identify which ones have an associated GitHub repo
  3. Get the default branch name of each one and construct the possible URL to their zzz.R file
  4. Contact the possible zzz.R file to see if it exists
  5. If it exists, download it
  6. Filter for zzz.R files that contain packageStartupMessage()

We’ve already attached the tidyverse packages, but we’ll also need two more packages:

library(gh)    # interact with GitHub API
library(httr)  # requests via the internet 

Note

If you’re thinking this approach is a bit long-winded, you’re right. As Tim pointed out, we could just extract the info via METACRAN, an unofficial CRAN mirror hosted on GitHub. It even has its own API. I’ll leave that as an exercise for the reader.

Packages

Luckily you can grab info for all current CRAN packages with the very handy CRAN_package_db() function.2

cran_pkgs <- as_tibble(tools::CRAN_package_db())

This returns a dataframe containing 19865 rows, where each one is a package, along with 67 variables. We get information like the stuff that’s found in package DESCRIPTION files, but it doesn’t tell us whether a package has a zzz.R file.

One way to do this is to visit the GitHub repo associated with the package, if it has one, and see if a zzz.R exists. Of course, many packages are not on GitHub, but we’re going to ignore those for simplicity.

Note

I re-rendered this post in July 2023, so the output no longer reflects CRAN as it was when this post was published (August 2021).

Github repos

A quick way of discovering if a package has a GitHub repo is to check for ‘github.com’ in the BugReports section of it DESCRIPTION file.3 Again, this doesn’t capture all the possible repos, but is fine for now.

has_repo <- cran_pkgs %>% 
  select(Package, BugReports) %>% 
  filter(str_detect(BugReports, "github")) %>% 
  transmute(
    Package,
    owner_repo = str_extract(
      str_replace_all(paste0(BugReports, "/x"), "//", "/"),
      "(?<=github.com/).*(?=/[a-zA-Z])"
    )
  ) %>% 
  separate(owner_repo, c("owner", "repo"), "/") %>% 
  filter(!is.na(Package), !is.na(owner), !is.na(repo)) %>% 
  distinct(Package, owner, repo) %>% 
  arrange(Package) 

sample_n(has_repo, 5)
# A tibble: 5 × 3
  Package     owner          repo       
  <chr>       <chr>          <chr>      
1 MatrixEQTL  andreyshabalin MatrixEQTL 
2 mitre       motherhack3r   mitre      
3 path.chain  krzjoa         path.chain 
4 shinyXYpad  stla           shinyXYpad 
5 netdiffuseR USCCANA        netdiffuseR

There were 19865 CRAN packages total and now we have 8152 (41%) that appear to have a GitHub repo.

If you’re wondering why we didn’t just use the package name as the repo name, it’s because they sometimes don’t match, e.g. {baseballDBR} is in a repo called ‘moneyball’.

Now we can use the repo details to build a URL to a potential zzz.R URL. This comes in the form https://raw.githubusercontent.com/<owner>/<repo>/<defaultbranch>/R/zzz.R".

Default branch

You’ll notice we don’t yet know the default branch of the package’s GitHub repo. Historically, we could probably have just hard-coded ‘master’, but the automatic default is now ‘main’. And of course, the default branch could be something else entirely.

We can grab the default branch for each repo from the GitHub API using the excellent {gh} package by Gábor Csárdi, Jenny Bryan and Hadley Wickham. You’ll need to do some setup to use it yourself.

The key function is gh(), to which you can pass a GET request for the information we want: GET /repos/{owner}/{repo}. We can iterate for each repo by passing each owner and repo name in turn. It returns a list object with lots of information about the repo.

I’ve created ‘possibly’ function variants with {purrr} so that any errors in the process are handled by returning NA, rather than breaking the loop, which would kill the process.

# Create 'try' function versions
map2_possibly <- possibly(map2, NA_real_)
gh_possibly <- possibly(gh, NA_real_)

# Function: fetch repo details, print message on action
get_repo <- function(owner, repo) {
  cat(paste0("[", Sys.time(), "]"), paste0(owner, "/", repo), "\n")
  gh_possibly("GET /repos/{owner}/{repo}", owner = owner, repo = repo) 
}

maybe_zzz <- has_repo %>%
  mutate(
    repo_deets =  map2_possibly(
      has_repo$owner, has_repo$repo, get_repo
    )
  ) %>% 
  mutate(
    default_branch = map(
      repo_deets, ~pluck(.x, "default_branch")
    ),
    default_branch = pluck(default_branch, 1),
    zzz_url = paste0(
      "https://raw.githubusercontent.com/",
      owner, "/", repo, "/", default_branch, "/R/zzz.R"
    )
  )

So now we have a column with the returned repo information, the extracted default branch name and a URL that points to a potential zzz.R file in that repo.

head(maybe_zzz)
# A tibble: 6 × 6
  Package      owner           repo         repo_deets default_branch zzz_url   
  <chr>        <chr>           <chr>        <list>     <chr>          <chr>     
1 AATtools     Spiritspeak     AATtools     <gh_rspns> master         https://r…
2 ABHgenotypeR StefanReuscher  ABHgenotypeR <gh_rspns> master         https://r…
3 ABM          junlingm        ABM          <gh_rspns> master         https://r…
4 ACEP         agusnieto77     ACEP         <gh_rspns> master         https://r…
5 ACNE         HenrikBengtsson ACNE         <gh_rspns> master         https://r…
6 ACWR         JorgeDelro      ACWR         <gh_rspns> master         https://r…

Status codes

Now we can check the status code for each of the URLs we’ve built. A return of 200 tells us that the file exists and 404 means it doesn’t.4 Again, we can prevent the loop breaking on error by creating a ‘possibly’ version of map().

library(httr)  # for status_code()

map_possibly <- possibly(map, NA_character_)

maybe_zzz_status <- maybe_zzz %>% 
  mutate(
    status = map_possibly(
      zzz_url, ~status_code(GET(.x))
    )
  ) %>% 
  unnest(status)

count(maybe_zzz_status, status)
# A tibble: 2 × 2
  status     n
   <int> <int>
1    200  1519
2    404  6631

Okay, great, we’ve got over a thousand zzz.R files.

Read content

Now we know which packages have a zzz.R file, we can use readLines() to grab their content from their URL, which again we can protect from errors with purrr::possibly().

Note that I’ve created a special version of readLines() that reports to the user the path being checked, but also has a random delay. This is to dampen the impact on GitHub’s servers.

# Function: readLines() but with a pause and message
readLines_delay <- function(path) {
  sample(1:3, 1)
  cat(paste0("[", Sys.time(), "]"), path, "\n")
  readLines(path, warn = FALSE)
}

readLines_delay_possibly <- possibly(readLines_delay, NA_character_)

fosho_zzz <- maybe_zzz_status %>% 
  select(-repo_deets) %>% 
  filter(status == 200) %>%  # just the 
  mutate(lines = map_possibly(zzz_url, readLines_delay_possibly))

dim(fosho_zzz)

So now we have a dataframe with a row per package and a list-column containing the R code in the zzz.R file.

Startup messages

Finally, we can find out which packages have a packageStartupMessage() call inside their zzz.R.

has_psm <- fosho_zzz %>% 
  select(Package, lines) %>%
  unnest(lines) %>%
  filter(str_detect(lines, "packageStartupMessage")) %>% 
  mutate(lines = str_remove_all(lines, " ")) %>%
  distinct(Package) %>% 
  pull()

fosho_psm <- filter(fosho_zzz, Package %in% has_psm)

So we started with 19865 CRAN packages and have winnowed it to down to 579 (3%) that have a call to packageStartupMessage() in their zzz.R.

Table of results

I could provide a table with all the zzz.R content, but I don’t want to break any licenses by reproducing them all here. Instead, here’s an interactive table that links to the GitHub page for each zzz.R file that appears to have a package startup message.

Click for table code
library(reactable)

reactable(
  data = fosho_psm %>% 
    select(package = Package, owner, url = zzz_url),
  searchable = TRUE,
  paginationType = "jump",
  defaultPageSize = 10,
  columns = list(
    url = colDef(cell = function(value) {
      htmltools::tags$a(href = value, target = "_blank", "zzz.R")
    })
  )
)

Note

I re-rendered this post in July 2023, so the table above may contain different packages to when it was first published. The section below relates to the originally-published post and may no longer reflect the content of the zzz.R files listed in the table above.

Patterns

I had a scan through the scripts and found some frequent uses of packageStartupMessages() to:

  • show a basic salutation (e.g. {afex})
  • show the version number, a check to see if the user has the latest version, sometimes a prompt to download the latest version for them (e.g. {vistributions}), sometimes a note that the package has been superseded by another (e.g. {drake})
  • links to guidance, examples, documentation (e.g. {bayesplot})
  • provide a citation or author names (e.g. {unvotes})
  • link to issue tracking or bug reporting (e.g. {timeperiodsR})
  • check for required supplementary software (e.g. {DALY})
  • remind of the need for credentials or keys for packages that access APIS, for example (e.g. {trainR})
  • provide terms of use, warranties, licenses, etc (e.g. {emmeans})

I was also interested to see:

  • a random tip, so you get something new each time you attach the package (e.g. {shinyjs})
  • appeals for GitHub stars (e.g. {sigminer})
  • links to purchasable course materials (e.g. {anomalise})

And perhaps the most self-aware were several packages that reminded the user that they can turn off startup messages with suppressPackageStartupMessages() if the messages get too annoying (e.g. {dendextend}).

A few interesting specifics (possible spoiler alerts!):

  • {bayestestR} and {sjmisc} have displays a special Star Wars message on a certain day of the year…
  • {SHT} and {symengine} load ASCII art, as does {BetaBit}, which also prompts the user for a game they’d like to play
  • {depigner} says ‘Welcome to depigner: we are here to un-stress you!’
  • {mde} has a friendly ‘Happy Exploration :)’ salutation and {manymodelr} says ‘Happy Modelling! :)’
  • {sjPLot} says ‘#refugeeswelcome’

You can use the interactive table above to reach each of the zzz.R files for these packages, or have a sift through yourself to see what you can find.

Buy my stuff?

Is there a line somewhere? Is it okay to advertise something? You could argue that someone has gone out of their way to release a package for free, so what harm is it in trying to get something back? or does this approach undermine the whole ‘open’ process?

I know some people find startup messages a bit annoying, but I think it’s easy enough for users to opt out of seeing them with a call to suppressPackageStartupMessages().

Mostly I’m kind of surprised by the lack of abuse of packageStartupMessage() in this sample. Let me know of any cheeky business you might have come across.

Environment

Session info
Last rendered: 2023-07-17 18:29:10 BST
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reactable_0.4.4 httr_1.4.6      gh_1.4.0        tidyverse_2.0.0
 [5] lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.2    
 [9] purrr_1.0.1     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1   
[13] ggplot2_3.4.2  

loaded via a namespace (and not attached):
 [1] gtable_0.3.3      jsonlite_1.8.7    compiler_4.3.1    tidyselect_1.2.0 
 [5] scales_1.2.1      yaml_2.3.7        fastmap_1.1.1     R6_2.5.1         
 [9] generics_0.1.3    knitr_1.43.1      htmlwidgets_1.6.2 munsell_0.5.0    
[13] pillar_1.9.0      tzdb_0.4.0        rlang_1.1.1       utf8_1.2.3       
[17] stringi_1.7.12    reactR_0.4.4      xfun_0.39         timechange_0.2.0 
[21] cli_3.6.1         withr_2.5.0       magrittr_2.0.3    crosstalk_1.2.0  
[25] digest_0.6.31     grid_4.3.1        fontawesome_0.5.1 rstudioapi_0.15.0
[29] hms_1.1.3         lifecycle_1.0.3   vctrs_0.6.3       evaluate_0.21    
[33] glue_1.6.2        fansi_1.0.4       colorspace_2.1-0  rmarkdown_2.23   
[37] ellipsis_0.3.2    tools_4.3.1       pkgconfig_2.0.3   htmltools_0.5.5  

Footnotes

  1. Which makes me wonder what the longest R function name is.↩︎

  2. I made use of this for the {kevinbacran} package and the associated ‘What’s your Hadley Number?’ app.↩︎

  3. I chose the BugReports field rather the URL field because people put all sorts of things in the latter, like links to websites, etc. BugReports (I think) tends to point to the source on GitHub.↩︎

  4. I wrote about status codes as part of the post on my {linkrot} package.↩︎

Reuse

CC BY-NC-SA 4.0