A GitHub repo template for R analysis

github
r
template
Author
Published

June 11, 2019

The green button on GitHub that's used to copy a repository template. It says 'use this template' on it.

tl;dr

I made a simple GitHub repo template for analysis projects. You can go to the repo and hit the ‘use this template’ to try it out.

Sharing is caring

GitHub has introduced repository templates to make it easier to share frequently-used repo structures and boilerplate code. You can copy a whole repo as many times as you like, unlike forks.

Following a template can help keep your work organised and encourage consistency within and between projects. I can see this being useful for sharing optimal workflows across boundaries like government departments.

Tada

I’ve made a lightweight repo template to make it easier for me to start very simple analysis projects with R.

It’s opinionated, but others may find it useful.

Screenshot of GitHub showing the contents of the repository template, including folders for data and outputs.

To use it, either:

  1. Go to the repo and click the green ‘use this template button’
  2. Append the repo URL with ‘/generate’ to go straight to the repo copying page

The screen looks like when you create a new repo from scratch, but it mentions that ‘the new repository will start with the same files and folders’ as the template repo. Add a name and description for the repo copy and hit the ‘create repository from template’ button.

Screenshot of the GitHub page for creating a repository from the template

That’s it.

Basic template structure

Much of the sentiment for creating this sort of folder structure has been described in detail elsewhere, including by Richard Fitzjohn and Joris Muller.

In short, the root has:

  • executable R files, separated into ‘sensible’ units of analysis (‘read’, ‘tidy’, etc)
  • an R Project file (.Rproj)
  • a .gitignore for R, with .DS_Store added
  • a README.md to summarise the project as a whole

There are folders for:

  • R functions (R/)
  • raw, untouched, read-only data sets (data/)
  • report source and output files (doc/)
  • external files (ext/), such as pre-trained models

Of course, these can be added to, removed or renamed as required.

Bulk it up

It’s not enough to have structure alone. There are plenty of guides for code and file creation. For example:

  • files should be named appropriately (advice from Jenny Bryan)
  • code should be consistent and follow a style guide (e.g. the tidyverse style guide), which is made easier with a linter, like {lintr}
  • READMEs are provided throughout, but documentation should continue in the analysis files themselves, with sensible comments throughout

The repo template focuses primarily on folder structure. It misses out some additional tools, including some that need a bit of further setup. You should think about including them anyway. For example:

There’s a certain amount of personal preference in the tools and techniques for these things and it’s probably best left to the user.

Let me know what you think is missing or what you would add.

Alternatives

Copying a repo template might not be optimal for you, but you can generate the structure for an analytical project in other ways.

One example is the {starters} package from Locke Data, which can be installed with remotes::install_github("lockedata/starters"). The create_analysis_project() function sets up a project folder for you with various arguments to add certain folders and setup things like dependency management, git and continuous integration.

Another option is to set up your analysis as a package, as outlined by Thomas J Leeper, for example. Jenny Bryan and Hadley Wickham’s {usethis} package provides lots of functions to help you create a package and add things to it, like create_project() and use_git().

I like these two options more than the template, to be honest. The template is really for quick, simple analysis; mostly for my own purposes.

Environment

Session info
Last rendered: 2023-07-25 21:44:48 BST
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.2 compiler_4.3.1    fastmap_1.1.1     cli_3.6.1        
 [5] tools_4.3.1       htmltools_0.5.5   rstudioapi_0.15.0 yaml_2.3.7       
 [9] rmarkdown_2.23    knitr_1.43.1      jsonlite_1.8.7    xfun_0.39        
[13] digest_0.6.33     rlang_1.1.1       evaluate_0.21    

Reuse

CC BY-NC-SA 4.0