Yihui Xie: RAP god

public-sector
r
reproducibility
rmarkdown
Author
Published

January 12, 2024

Screenshot of the Slack emoji picker. The user has searched for 'yihui', which returns an emoji profile picture of Yihui Xie.

tl;dr

Taking a moment to thank Yihui, who has unwittingly made possible the rise of Reproducible Analytical Pipelines (RAP).

Hooray for Yihui

Yihui Xie is an R legend. He was, however, recently laid off by his employers at Posit.

I’ve personally benefited a great deal from Yihui’s work, from writing reproducible presentations with {xaringan} to producing the original version of this blog with {blogdown}.

At a grander scale, Yihui’s contributions to the R ecosystem have had a lasting and transformational impact on how we generate Official Statistics in the UK, where R Markdown and {knitr} are essential and ubiquitous tools in particular.

So much so that we have a custom Yihui Slack emoji.

That’s a RAP

Put (far too) simply, a Reproducible Analytical Pipeline (RAP) is any code-driven, version-controlled workflow that reads data, processes it and creates consumable outputs, while ensuring that the process can be re-run in the future and by others.

RAP was birthed from ‘DataOps’ principles with a focus on the production of statistical publications: reports and data files for public consumption, published officially on the UK government’s website. These files are important for transparency and decision making.

These days, RAP is so much more: it’s a way of thinking, a community and a movement1. Its ethos has spread across the UK public sector and is gaining traction globally through efforts like Bruno Rodrigues’s excellent book.

R is for RAP

RAP is language agnostic2, but R has emerged as the preferred option for statistical production in the UK’s government and public sector. Why? Possibly because R is a data- and stats-first language3 and therefore a natural choice for statistics professionals.

Of course, R can easily cover the whole ‘soup-to-nuts’ workflow. Not just ingestion and digestion of data, but also crucially the creation of reports. R Markdown and {knitr} are the obvious tool for this kind of document generation, for which we must thank Yihui for his tireless and humble efforts.

But what makes R Markdown so conducive to RAP, in particular? Well, stats publications are generally periodical (often weekly) and R Markdown is perfect for literate programming at pace: you can create a skeleton document that can be updated dynamically with R code, saving so much time when a new version of the publication needs to be created with fresh data.

Crucially, R Markdown is relatively simple to learn and use4. You write some plain text and mark it up with simple adornments5. This suits perfectly the range of skills and abilities in statistical teams across the public sector, where staff are often ‘numbers-people’ first and ‘coders’ second.

Hence why R Markdown has been a central tenet of RAP since Dr Matt Upson6, RAP’s ‘Founding Father’, noted it in his germinal blog post.

Down, but not out

Of course, I’m not alone: many others have talked about their appreciation for Yihui and his work, including Eric and Mike’s discussion on the R Weekly podcast and Emily’s thread.

You can also take a look at the incredible number of people who have signed up to sponsor Yihui on GitHub, which sits just shy of 300 at the time of writing7.

Thank you, Yihui. We look forward to what comes next.

Environment

Session info
Last rendered: 2024-01-22 17:48:46 GMT
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.2 compiler_4.3.1    fastmap_1.1.1     cli_3.6.2        
 [5] tools_4.3.1       htmltools_0.5.6.1 rstudioapi_0.15.0 yaml_2.3.8       
 [9] rmarkdown_2.25    knitr_1.45        jsonlite_1.8.7    xfun_0.41        
[13] digest_0.6.33     rlang_1.1.3       evaluate_0.23    

Footnotes

  1. To the extent that you can use ‘RAP’ as a noun (‘we have many RAPs in our department’) and verb (‘I’m going to RAP this publication’).↩︎

  2. Not strictly true. We’re talking here about open source languages like R and Python. Your proprietary tool of choice is not RAP compliant, sorry.↩︎

  3. R has grown beyond statistical analysis, of course. You can build apps and websites and so much more without ever ‘doing stats’. Just ask David Keyes.↩︎

  4. Compared to what? Most regular R users these days are unlikely to have encountered Sweave, for example, which is actually built into R. Sweave relies on knowledge of document preparation using LaTeX, which most of us barely know how to pronounce, let alone use.↩︎

  5. This becomes even easier with, for example, RStudio’s ‘visual’ mode for R Markdown files, which includes a GUI for marking-up to your text without needing to remember how to mark it up in **bold**, for example.↩︎

  6. RAP was pioneered in the UK government by Dr Matt Upson and the team at the Government Digital Service (GDS), not limited to Dr Mat Gregory (too many Matts, amirite?) and Duncan Garmonsway, as well as several early adopters like the Department for Culture, Media and Sport.↩︎

  7. Disclaimer: I’m one of these people.↩︎

Reuse

CC BY-NC-SA 4.0