
Landing page of coronavirus.data.gov.uk
tl;dr
I used the {sonify} package in R to represent a year of the UK’s COVID-19 data in audio format. You can jump straight to the audio.
Listen to your data
I watched an excellent talk at the rstudio::global(2021) conference by JooYoung Seo titled ‘Accessible Data Science Beyond Visual Models: Non-Visual Interactions with R and RStudio Packages’. You can access the video or his blog on the subject.
In the talk he mentioned the {sonify} package for R, which lets you represent data with sound rather than with visuals. For example, values of x and y that increase linearly can be represented by a sound that rises in pitch.
I wondered: what would COVID-19 data sound like, given it’s been a year since the UK’s first cases?
COVID-19 data
GOV.UK, the UK government’s website, has a ‘daily dashboard’ of COVID-19 statistics. There are four prominent statistics:
- Cases (people tested positive)
- Deaths (deaths within 28 days of a positive test)
- Healthcare (patients admitted to hospital)
- Testing (virus tests conducted)
The downloads page contains these data and more, both UK-wide and at local levels. This post isn’t an analysis, but I implore you to take a look a the data yourself and read the details about how the data were collected.
Helpfully, you can generate a permanent API link from which to fetch data1. Here I’m grabbing the UK-wide stats mentioned above:
data <- read.csv(
paste0(
"https://api.coronavirus.data.gov.uk/v2/data",
"?areaType=overview", # UK wide
"&metric=newCasesBySpecimenDate", # cases
"&metric=newDeaths28DaysByDeathDate", # deaths
"&metric=newAdmissions", # healthcare
"&metric=newVirusTests", # testing
"&format=csv" # CSV output
),
stringsAsFactors = FALSE
)
I’ll apply some minor cleaning to order by date and isolate the first 365 days, which takes us to 28 January 2021.
data <- data[order(data$date), ] # order by date
data <- data[1:365, ] # first year
range(data$date)
## [1] "2020-01-30" "2021-01-28"
I read this into R as a data.frame
object with one row per day.
tail(data[, c(1, 5:8)])
## date newCasesBySpecimenDate newDeaths28DaysByDeathDate newAdmissions
## 18 2021-01-23 21851 1151 3100
## 17 2021-01-24 17191 1134 3109
## 16 2021-01-25 29976 1152 2925
## 15 2021-01-26 27036 1044 3136
## 14 2021-01-27 25720 1093 3050
## 13 2021-01-28 24092 1083 3039
## newVirusTests
## 18 484485
## 17 412204
## 16 542893
## 15 596845
## 14 771710
## 13 753031
How quickly a year goes.
AV functions
You can skip to the next section if you aren’t interested in the code that will be producing the audio and plots.
Audio
I’ve written a small function using sonify::sonify()
to generate audio clips that represent each COVID-19 variable over time.
You pass sonify()
your x and y points as you would the plot()
function. It has a number of audio-related arguments that let you modify things like the waveform and interpolation, but I’m sticking to the defaults here. This produces a five-second clip in stereo, so you’ll hear the sound move from left to right as you listen.
The {tuneR} package has the function tuneR::writeWav()
to write out the audio to a local .wav file (my desktop in this case).
sonify_covid <- function(y, out_dir = "~/Desktop") {
tuneR::writeWave(
sonify::sonify(
x = as.Date(data$date), y = data[[y]],
play = FALSE # suppress audio from playing
),
file.path(out_dir, paste0(y, ".wav"))
)
}
# Apply the function each variable
purrr::walk(names(data[5:8]), sonify_covid)
These clips are embedded above the plots in the section below. A download link is included on the player. If you have trouble playing or downloading any of the clips, you can also access them in a playlist on SoundCloud.
Visual
I’m including plots so you can follow how the visuals map to the sound. The plots are going to be intentionally sparse because the focus of the post is the sound the data make. The function takes a COVID-19 variable from our dataset and plots it over time with {ggplot2}.
library(ggplot2) # attach plotting package
plot_covid <- function(y) {
ggplot() +
geom_point(
aes(as.Date(data$date), data[[y]] / 1000),
shape = 21 # empty-circle character
) +
labs(
caption = "Data: https://coronavirus.data.gov.uk/",
x = "Date", y = "Count (thousands)"
) +
theme_minimal()
}
You can then pass in the variable like plot_covid(newAdmissions)
, although I’ve hidden this code in the next section.
COVID-19 sonified
In each clip, a higher pitch indicates a higher value; a more continuous tone indicates that the points are tightly distributed; and the sound moving from the left to right audio channel indicates change over time.
All of these datasets start on the same date, 30 January 2020, which is when the first cases were recorded according to the newCasesBySpecimenDate
variable. They all end 365 days later on 28 January 2021.
These data are quite well suited to sonification, given the peaks and troughs. In particular, the death and healthcare variables spike quickly, fall back down, rise again, drop slightly and then peak once more. You won’t notice that initial spike for the cases variable, given the relatively lower testing rate at the time.
Death
This audio and plot show the number of recorded deaths within 28 days of a positive test over time.
Coda
Sonification has been used for a variety of applications during the pandemic as an alternate means of conveying the data.
For example, Jan Willem Tulp has created a page that ‘dings’ each time there’s a new case around the world. For something more complex, Mark D. Temple has published in the BMC Bioinformatics journal a paper about sonifying the COVID-19 genome (!). Meanwhile, Pedro Pereira Sarmento has sonified data to investigate the impacts of COVID-19 on air pollution.
I’m probably not the first to sonify coronavirus data in this way, and probably not even the first to do it with R, but it seemed a good time to take a look (listen?) back on things. I’m interested to hear more about what approaches others have taken.
Session info
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.0.2 (2020-06-22)
## os macOS 10.16
## system x86_64, darwin17.0
## ui X11
## language (EN)
## collate en_GB.UTF-8
## ctype en_GB.UTF-8
## tz Europe/London
## date 2021-02-10
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
## blogdown 0.21 2020-10-11 [1] CRAN (R 4.0.2)
## bookdown 0.21 2020-10-13 [1] CRAN (R 4.0.2)
## cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.2)
## colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.2)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
## digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
## dplyr 1.0.2 2020-08-18 [1] CRAN (R 4.0.2)
## ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
## evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
## fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
## farver 2.0.3 2020-01-16 [1] CRAN (R 4.0.0)
## generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2)
## ggplot2 * 3.3.2 2020-06-19 [1] CRAN (R 4.0.2)
## glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
## highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
## htmltools 0.5.1.9000 2021-01-17 [1] Github (rstudio/htmltools@11cfbf3)
## knitr 1.31 2021-01-27 [1] CRAN (R 4.0.2)
## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.0.2)
## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
## magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
## pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.2)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
## purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
## R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
## rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2)
## rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.2)
## scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
## stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
## tibble 3.0.4 2020-10-12 [1] CRAN (R 4.0.2)
## tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
## vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.2)
## withr 2.4.0 2021-01-16 [1] CRAN (R 4.0.2)
## xfun 0.20 2021-01-06 [1] CRAN (R 4.0.2)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
##
## [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
Fair usage applies. Ten requests per any 100–second period, with a max rate limit of 100 per hour. Five metrics max per request. Identical requests only refreshed every 150 seconds.↩︎