# tl;dr

Two years ago I won a data-viz recreation competition run by the Royal Statistical Society (RSS) using base R’s plotting. I wrote a short {ggplot2} how-to for RSS’s ‘Significance’ magazine that was never published1, so here it is now.

# Recreate

This short code walkthrough will get you started on recreating Mary Eleanor Spear’s cotton plot (1952), as used in the Royal Statistical Society’s #CottonViz challenge. We’ll concentrate on the line chart for now.

The {ggplot2} package in R is a good choice, since we can build up the chart in steps: first, we’ll build a basic line chart, remove unneeded elements, fix the axes and finally add the labels. It won’t look perfectly like Spear’s original, but we’ll get close.

This isn’t a guide to learn {ggplot2}, so you may want to learn the basics first. Alternatively, I wrote a blog post about building Spear’s entire visualisation using base R only.

# Requirements

First, some preparation. If you haven’t already, install the {ggplot2} package for plotting, {tidyr} data reshaping and {extrafont} for font handling.

install.packages("ggplot2", "tidyr", "extrafont")

You can download for free the Routed Gothic font by Darren Embry, which is a good approximation of the stencil lettering used by Spear. Installation will depend on your system, but in macOS you can simply drag the font files to the Font Book app. When you attach {extrafont} it’ll fetch automatically your installed fonts to use in R.

library(extrafont)

## Registering fonts with R

The cotton dataset is quite small, so we can create the dataframe ourselves. It provides information on the supply of cotton in the USA in the 1940s.

cotton_raw <- data.frame(
year           = 1942:1948,
us_consumption = c(11160, 9993,  9693,  9423,  10072, 9374,  7833),
exports        = c(1480,  1139,  2007,  3613,  3545,  1968,  4785),
stocks         = c(10657, 10744, 11164, 7326,  2530,  3080,  5283),
total_supply   = c(23297, 21876, 22864, 20362, 16147, 14422, 17901)
)

It’s preferable to make the data ‘tidy’ so that there’s one row per year and consumption type, and one column for each variable. The {tidyr} package can help us pivot the data to ‘long’ format from this ‘wide’ format.

library(tidyr)

cotton <- cotton_raw %>%
pivot_longer(
c(us_consumption, exports, stocks),
names_to = "consumption_type", values_to = "boles"
)

head(cotton, 4)  # preview first few rows
## # A tibble: 4 × 4
##    year total_supply consumption_type boles
##   <int>        <dbl> <chr>            <dbl>
## 1  1942        23297 us_consumption   11160
## 2  1942        23297 exports           1480
## 3  1942        23297 stocks           10657
## 4  1943        21876 us_consumption    9993

# How-to

## Step 1: line chart

Now we can create a basic line chart of the data with geom_line() and set with scale_linetype_manual() a unique dashed line per group. Further arguments set the title and the typeface to be used throughout the plot, while a small tweak to theme() adjusts the title’s position.

library(ggplot2)

p1 <- ggplot() +
geom_line(
data = cotton,
aes(x = year, y = boles / 1000, linetype = consumption_type),
linewidth = 1.5
) +
scale_linetype_manual(values = c("longdash", "dashed", "solid")) +
labs(title = "Millions of Boles") +
theme(
plot.title = element_text(hjust = -0.05),
text = element_text(family = "Routed Gothic")
)

p1

## Step 2: remove features

Let’s clear away the unneeded features: the background panel, the axes titles and the legend. You can empty these with element_blank() in the theme() function.

p2 <- p1 +
theme(
panel.background = element_blank(),
axis.title = element_blank(),
legend.position = "none"
)

p2

## Step 3: correct the axes

Now we can address the axes. Use the scale_*_continuous() functions to set the axes values, limits, origin and labels. With sec.axis you can create a secondary y-axis that mirrors the first, then remove the tick labels in the theme() function. You can also put a box around the chart area with the panel.border argument.

p3 <- p2 +
scale_x_continuous(
breaks = seq(1942, 1948, 1),
labels = c("1942", paste0("'", 43:48)),
expand = c(0, 0)
) +
scale_y_continuous(
breaks = seq(0, 12, 2),
limits = c(0, 12),
expand = c(0, 0),
sec.axis = dup_axis()
) +
theme(
axis.ticks = element_line(linewidth = c(0, rep(0.5, 5), 0)),
axis.ticks.length = unit(-0.5, "lines"),
axis.text.y.right = element_blank(),
panel.border = element_rect(fill = NA, linewidth = 1)
)

p3

## Step 4: labels

The only missing features are the labels and arrows, which can be added with the annotate() and geom_segment(), respectively. A bit of trial-and-error will help you find the correct coordinates to place these elements.

p4 <- p3 +
annotate(
geom = "text",
x = c(1946.1, 1945.9, 1943.75),
y = c(10.8, 7.1, 3.2),
label = c("U. S. Consumption", "Carry – over\nStocks", "Exports"),
family = "Routed Gothic"
) +
geom_segment(
aes(
x = c(1945.2, 1945.3, 1944.2),
y = c(10.5, 7.4, 3.1),
xend = c(1945, 1945.1, 1944.4),
yend = c(9.7, 7.1, 2.8)
),
arrow = arrow(
length = unit(2, "mm"),
type = "closed"
)
)

p4

# Next steps

Finally we’ve got a lineplot that looks pretty close to Spear’s visualisation. What subtle differences do you notice, though? Try to find ways to improve them.

Next, try to recreate the stacked-barchart from Spear’s original and then arrange the plots with a main title and surrounding text labels. The {ggpattern} package may help you recreate the hatchlines on the bars and {patchwork} could help with the arrangement of the plot and text elements.

# Full base R alternative

For the original challenge I used only base R’s plotting system rather than {ggplot2}. This is what my submitted image looked like:

You can read more about it in the accompanying blog post and you can find the original code on GitHub.

Session info
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.0 (2022-04-22)
##  os       macOS Big Sur/Monterey 10.16
##  system   x86_64, darwin17.0
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/London
##  date     2023-06-07
##  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.9     2022-03-28 [1] CRAN (R 4.2.0)
##  bookdown      0.26    2022-04-15 [1] CRAN (R 4.2.0)
##  bslib         0.3.1   2021-10-06 [1] CRAN (R 4.2.0)
##  cli           3.6.1   2023-03-23 [1] CRAN (R 4.2.0)
##  colorspace    2.0-3   2022-02-21 [1] CRAN (R 4.2.0)
##  digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.0)
##  dplyr         1.1.0   2023-01-29 [1] CRAN (R 4.2.0)
##  evaluate      0.20    2023-01-17 [1] CRAN (R 4.2.0)
##  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.2.0)
##  farver        2.1.1   2022-07-06 [1] CRAN (R 4.2.0)
##  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
##  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.0)
##  ggplot2     * 3.4.1   2023-02-10 [1] CRAN (R 4.2.0)
##  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
##  gtable        0.3.1   2022-09-01 [1] CRAN (R 4.2.0)
##  highr         0.10    2022-12-22 [1] CRAN (R 4.2.0)
##  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.2.0)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.2.0)
##  jsonlite      1.8.4   2022-12-06 [1] CRAN (R 4.2.0)
##  knitr         1.42    2023-01-25 [1] CRAN (R 4.2.0)
##  labeling      0.4.2   2020-10-20 [1] CRAN (R 4.2.0)
##  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.0)
##  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
##  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
##  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.2.0)
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
##  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.2.0)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
##  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.2.0)
##  rmarkdown     2.14    2022-04-25 [1] CRAN (R 4.2.0)
##  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.0)
##  sass          0.4.1   2022-03-23 [1] CRAN (R 4.2.0)
##  scales        1.2.1   2022-08-20 [1] CRAN (R 4.2.0)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
##  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.2.0)
##  tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.2.0)
##  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.0)
##  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.2.0)
##  vctrs         0.6.2   2023-04-19 [1] CRAN (R 4.2.0)
##  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
##  xfun          0.37    2023-01-31 [1] CRAN (R 4.2.0)
##  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.0)
##
##  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
##
## ──────────────────────────────────────────────────────────────────────────────

1. At least I don’t think so. I can’t find it by searching on the website, anyway. Also, enough time has passed that certain bits of the original code have since been deprecated in {ggplot2}, lol.↩︎