What does a year of COVID-19 sound like?

Landing page of coronavirus.data.gov.uk showing a dashboard of data elements, including plots for number of tests and deaths.

Landing page of coronavirus.data.gov.uk

tl;dr

I used the {sonify} package in R to represent a year of the UK’s COVID-19 data in audio format. You can jump straight to the audio.

Listen to your data

I watched an excellent talk at the rstudio::global(2021) conference by JooYoung Seo titled ‘Accessible Data Science Beyond Visual Models: Non-Visual Interactions with R and RStudio Packages’. You can access the video or his blog on the subject.

In the talk he mentioned the {sonify} package for R, which lets you represent data with sound rather than with visuals. For example, values of x and y that increase linearly can be represented by a sound that rises in pitch.

I wondered: what would COVID-19 data sound like, given it’s been a year since the UK’s first cases?

COVID-19 data

GOV.UK, the UK government’s website, has a ‘daily dashboard’ of COVID-19 statistics. There are four prominent statistics:

  1. Cases (people tested positive)
  2. Deaths (deaths within 28 days of a positive test)
  3. Healthcare (patients admitted to hospital)
  4. Testing (virus tests conducted)

The downloads page contains these data and more, both UK-wide and at local levels. This post isn’t an analysis, but I implore you to take a look a the data yourself and read the details about how the data were collected.

Helpfully, you can generate a permanent API link from which to fetch data1. Here I’m grabbing the UK-wide stats mentioned above:

data <- read.csv(
  paste0(
    "https://api.coronavirus.data.gov.uk/v2/data",
    "?areaType=overview", # UK wide
    "&metric=newCasesBySpecimenDate",  # cases
    "&metric=newDeaths28DaysByDeathDate",  # deaths
    "&metric=newAdmissions",  # healthcare
    "&metric=newVirusTests",  # testing
    "&format=csv"  # CSV output
  ),
  stringsAsFactors = FALSE
)

I’ll apply some minor cleaning to order by date and isolate the first 365 days, which takes us to 28 January 2021.

data <- data[order(data$date), ]  # order by date
data <- data[1:365, ]  # first year
range(data$date)
## [1] "2020-01-30" "2021-01-28"

I read this into R as a data.frame object with one row per day.

tail(data[, c(1, 5:8)])
##          date newCasesBySpecimenDate newDeaths28DaysByDeathDate newAdmissions
## 18 2021-01-23                  21851                       1151          3100
## 17 2021-01-24                  17191                       1134          3109
## 16 2021-01-25                  29976                       1152          2925
## 15 2021-01-26                  27036                       1044          3136
## 14 2021-01-27                  25720                       1093          3050
## 13 2021-01-28                  24092                       1083          3039
##    newVirusTests
## 18        484485
## 17        412204
## 16        542893
## 15        596845
## 14        771710
## 13        753031

How quickly a year goes.

AV functions

You can skip to the next section if you aren’t interested in the code that will be producing the audio and plots.

Audio

I’ve written a small function using sonify::sonify() to generate audio clips that represent each COVID-19 variable over time.

You pass sonify() your x and y points as you would the plot() function. It has a number of audio-related arguments that let you modify things like the waveform and interpolation, but I’m sticking to the defaults here. This produces a five-second clip in stereo, so you’ll hear the sound move from left to right as you listen.

The {tuneR} package has the function tuneR::writeWav() to write out the audio to a local .wav file (my desktop in this case).

sonify_covid <- function(y, out_dir = "~/Desktop") {

    tuneR::writeWave(
      sonify::sonify(
        x = as.Date(data$date), y = data[[y]],
        play = FALSE  # suppress audio from playing
      ),
      file.path(out_dir, paste0(y, ".wav"))
    )
  
}

# Apply the function each variable
purrr::walk(names(data[5:8]), sonify_covid)

These clips are embedded above the plots in the section below. A download link is included on the player. If you have trouble playing or downloading any of the clips, you can also access them in a playlist on SoundCloud.

Visual

I’m including plots so you can follow how the visuals map to the sound. The plots are going to be intentionally sparse because the focus of the post is the sound the data make. The function takes a COVID-19 variable from our dataset and plots it over time with {ggplot2}.

library(ggplot2)  # attach plotting package

plot_covid <- function(y) {

  ggplot() +
    geom_point(
      aes(as.Date(data$date), data[[y]] / 1000),
      shape = 21  # empty-circle character
    ) +
    labs(
      caption = "Data: https://coronavirus.data.gov.uk/",
      x = "Date", y = "Count (thousands)"
    ) +
    theme_minimal()
  
}

You can then pass in the variable like plot_covid(newAdmissions), although I’ve hidden this code in the next section.

COVID-19 sonified

In each clip, a higher pitch indicates a higher value; a more continuous tone indicates that the points are tightly distributed; and the sound moving from the left to right audio channel indicates change over time.

All of these datasets start on the same date, 30 January 2020, which is when the first cases were recorded according to the newCasesBySpecimenDate variable. They all end 365 days later on 28 January 2021.

These data are quite well suited to sonification, given the peaks and troughs. In particular, the death and healthcare variables spike quickly, fall back down, rise again, drop slightly and then peak once more. You won’t notice that initial spike for the cases variable, given the relatively lower testing rate at the time.

Cases

This audio and plot show the number of people who have tested positive over time.

Scatter chart of COVID-19 cases over time. There's a small bump to about 5000 cases per day in early April 2020 and then a peak around 30,000 cases in November, before a large spike to around 80,000 cases at the start of 2021.

Death

This audio and plot show the number of recorded deaths within 28 days of a positive test over time.

Scatter chart of COVID-19 deaths over time. There's a spike at 1000 cases at the start of April 2020, before it comes back down to almost zero by September, before rising to 500 before December, then spiking at 1500 at the start of 2021.

Healthcare

This audio and plot show the number of patients admitted to hospital over time.

Scatter chart of COVID-19 hospital admissions over time. There's a spike at 3500 cases at the start of April 2020, before it comes back down to almost zero by September, before rising to 2000 before December, then spiking at 4500 at the start of 2021.

Testing

This audio and plot show the number of virus tests conducted over time.

Scatter chart of COVID-19 tests over time. The value climbs from zero at the end of April 2020 to nearly 800 by the end of January 2021.

Coda

Sonification has been used for a variety of applications during the pandemic as an alternate means of conveying the data.

For example, Jan Willem Tulp has created a page that ‘dings’ each time there’s a new case around the world. For something more complex, Mark D. Temple has published in the BMC Bioinformatics journal a paper about sonifying the COVID-19 genome (!). Meanwhile, Pedro Pereira Sarmento has sonified data to investigate the impacts of COVID-19 on air pollution.

I’m probably not the first to sonify coronavirus data in this way, and probably not even the first to do it with R, but it seemed a good time to take a look (listen?) back on things. I’m interested to hear more about what approaches others have taken.


Session info
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.2 (2020-06-22)
##  os       macOS  10.16                
##  system   x86_64, darwin17.0          
##  ui       X11                         
##  language (EN)                        
##  collate  en_GB.UTF-8                 
##  ctype    en_GB.UTF-8                 
##  tz       Europe/London               
##  date     2021-02-10                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version    date       lib source                            
##  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.0.0)                    
##  blogdown      0.21       2020-10-11 [1] CRAN (R 4.0.2)                    
##  bookdown      0.21       2020-10-13 [1] CRAN (R 4.0.2)                    
##  cli           2.2.0      2020-11-20 [1] CRAN (R 4.0.2)                    
##  colorspace    2.0-0      2020-11-11 [1] CRAN (R 4.0.2)                    
##  crayon        1.3.4      2017-09-16 [1] CRAN (R 4.0.0)                    
##  digest        0.6.27     2020-10-24 [1] CRAN (R 4.0.2)                    
##  dplyr         1.0.2      2020-08-18 [1] CRAN (R 4.0.2)                    
##  ellipsis      0.3.1      2020-05-15 [1] CRAN (R 4.0.0)                    
##  evaluate      0.14       2019-05-28 [1] CRAN (R 4.0.0)                    
##  fansi         0.4.1      2020-01-08 [1] CRAN (R 4.0.0)                    
##  farver        2.0.3      2020-01-16 [1] CRAN (R 4.0.0)                    
##  generics      0.1.0      2020-10-31 [1] CRAN (R 4.0.2)                    
##  ggplot2     * 3.3.2      2020-06-19 [1] CRAN (R 4.0.2)                    
##  glue          1.4.2      2020-08-27 [1] CRAN (R 4.0.2)                    
##  gtable        0.3.0      2019-03-25 [1] CRAN (R 4.0.0)                    
##  highr         0.8        2019-03-20 [1] CRAN (R 4.0.0)                    
##  htmltools     0.5.1.9000 2021-01-17 [1] Github (rstudio/htmltools@11cfbf3)
##  knitr         1.31       2021-01-27 [1] CRAN (R 4.0.2)                    
##  labeling      0.4.2      2020-10-20 [1] CRAN (R 4.0.2)                    
##  lifecycle     0.2.0      2020-03-06 [1] CRAN (R 4.0.0)                    
##  magrittr      2.0.1      2020-11-17 [1] CRAN (R 4.0.2)                    
##  munsell       0.5.0      2018-06-12 [1] CRAN (R 4.0.0)                    
##  pillar        1.4.7      2020-11-20 [1] CRAN (R 4.0.2)                    
##  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.0.0)                    
##  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.0.0)                    
##  R6            2.5.0      2020-10-28 [1] CRAN (R 4.0.2)                    
##  rlang         0.4.10     2020-12-30 [1] CRAN (R 4.0.2)                    
##  rmarkdown     2.6        2020-12-14 [1] CRAN (R 4.0.2)                    
##  scales        1.1.1      2020-05-11 [1] CRAN (R 4.0.0)                    
##  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.0.0)                    
##  stringi       1.5.3      2020-09-09 [1] CRAN (R 4.0.2)                    
##  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.0.0)                    
##  tibble        3.0.4      2020-10-12 [1] CRAN (R 4.0.2)                    
##  tidyselect    1.1.0      2020-05-11 [1] CRAN (R 4.0.0)                    
##  vctrs         0.3.6      2020-12-17 [1] CRAN (R 4.0.2)                    
##  withr         2.4.0      2021-01-16 [1] CRAN (R 4.0.2)                    
##  xfun          0.20       2021-01-06 [1] CRAN (R 4.0.2)                    
##  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.0)                    
## 
## [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

  1. Fair usage applies. Ten requests per any 100–second period, with a max rate limit of 100 per hour. Five metrics max per request. Identical requests only refreshed every 150 seconds.↩︎