Reproducibility in R: three things

drake
r
rap
reproducibility
rmarkdown
Author
Published

January 22, 2020

A T rex in sunglasses with the text 'works on my machine'.

tl;dr

Three tips for reproducibility in R: centralise everything; report with code; manage your workflows.

Reproducevangelism

I spoke at the Department for Education’s Data Science Week. I wanted everyone – newer and more experienced users alike – to learn at least one new thing about reproduciblity with R and RStudio.

The slides are embedded below and you can also get them fullscreen online (press ‘F’ for fullscreen and ‘P’ for presenter notes) and find the source on GitHub.

Three things

The three things to achieve reproducibility were very broad. I focused on R and some specific packages that could be helpful, but the ideas are transferable and there’s lots of ways to achieve the same thing.

The things were:

1. Centralise everything

Get code, functions, data, documentation in one place. Use R Projects in RStudio and write packages. This makes code more shareable and improves the chance that others can recreate things on their machine.

2. Report with code

Put code inside your report so that updates to data and code will be reflected instantly. Use R Markdown and other formats like Yihui Xie’s {xaringan} for reproducible slides and {bookdown} by for reproducible books.

3. Manage workflows

Don’t use your brain to store information about the dependencies within your analysis. Use {drake} by Will Landau instead. It remembers all the relationships between the files, objects and fcuntions in your analysis and only re-runs what needs to be re-run following changes.

Acknowledgements

I keep referring to the same resources about reproducibility. Take a look at:

On this blog

Relevant rostrum.blog reproduciblity-related writings:

Environment

Session info
Last rendered: 2023-07-22 16:29:14 BST
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.2   compiler_4.3.1      fastmap_1.1.1      
 [4] cli_3.6.1           tools_4.3.1         htmltools_0.5.5    
 [7] xaringanExtra_0.7.0 rstudioapi_0.15.0   yaml_2.3.7         
[10] rmarkdown_2.23      knitr_1.43.1        jsonlite_1.8.7     
[13] xfun_0.39           digest_0.6.33       rlang_1.1.1        
[16] evaluate_0.21      

Reuse

CC BY-NC-SA 4.0