3 min read

Waggle dance with ggbeeswarm and emoGG

Matt Dray (@mattdray)

A bee scene from irreverent 90s Nicktoon Hey Arnold! (via Giphy)

A bee scene from irreverent 90s Nicktoon ‘Hey Arnold!’ (via Giphy)

How to plot grouped continuous data?

A boxplot lets you show continuous data split by categories, but it hides the data points and doesn’t tell you much about distribution. A violin chart will show the distribution but you still don’t know about the density of data.

Stripcharts show the data for each category as individual points. The points can be layered on top of each other where they take the same Y value and can be stretched arbitrarily along the X axis.

If you don’t have too much data, or if you sample it, you can stop the data points in a stripchart from overlapping and instead line them up side by side where they take the same Y value. This is called a ‘beeswarm’.

Why? Probably because the cloud of data you’re plotting looks a bit like a swarm of bees.

Obvious next step

We can test this theory by plotting the points as actual bees, lol. Well, emoji bees. Duncan (of tidyxl and unpivotr fame) did exactly this and tweeted the plot and code.

To summarise, Duncan did this by hacking emojis via emoGG into ggbeeswarm’s geom_beeswarm() function to create gg_beeswarm_emoji() (patent pending).

Obvious next next step

Wouldn’t it be better if the little emoji bees moved around a little bit? Almost like a waggle dance?

I cheated a little bit and recoded the geom_quasirandom() function from ggbeeswarm instead of geom_beeswarm(). Why? Beeswarm plots have an inherent ‘neatness’ to them. That is not becoming of a beeswarm. Instead, geom_quasirandom() gives you some ‘random’ jitter each time you plot the data.

So we can plot the same data several times and stack the images into a gif. One easy way to do this is via the magick package, a re-engineering of the open-source ImageMagick sute of tools from Jeroen Ooms at ROpenSci.


Load the packages.

library(emoGG)  # remotes::install_github("dill/emoGG")

Recode the geom_quasirandom() to display emoji. Idea stolen from Duncan’s tweet.

geom_quasi_emoji <- function (mapping = NULL, data = NULL, width = NULL, varwidth = FALSE, 
          bandwidth = 0.5, nbins = NULL, method = "quasirandom", groupOnX = NULL, 
          dodge.width = 0, stat = "identity", position = "quasirandom", 
          na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, emoji = "1f4l1d", ...) {
  img <- emoji_get(emoji)[[1]]
  position <- position_quasirandom(width = width, varwidth = varwidth, 
                                   bandwidth = bandwidth, nbins = nbins, method = method, 
                                   groupOnX = groupOnX, dodge.width = dodge.width)
  ggplot2::layer(data = data, mapping = mapping, stat = stat, 
                 geom = emoGG:::GeomEmoji, position = position, show.legend = show.legend, 
                 inherit.aes = inherit.aes, params = list(na.rm = na.rm, img = img, ...))

It makes sense to use the data that Duncan generated so we can compare the static plot to the animated one.

swarm <- data.frame(
  "variable" = rep(c("runif", "rnorm"), each = 100),
  "value" = c(runif(100, min = -3, max = 3), rnorm(100))

Let’s define what our plot should look like. method = "pseudorandom" is the bit that gives us the jittering.

plot <- ggplot(swarm, aes(variable, value)) +
  geom_quasi_emoji(emoji = "1f41d", method = "pseudorandom") +
  theme(panel.background = element_rect(fill = "skyblue")) +
  ggtitle("WAGGLE DANCE")

Now we can create a few versions of thi plot with different jittering. The plots are magick objects made with image_graph() from magick.

We can loop through a few plots, each representing a frame in the final gif.

And now image_animate() can be used to combine those magick objects into a gif.

waggle_dance <- image_animate(c(t1, t2, t3, t4))

And we can save this with image_write().

image_write(waggle_dance, "waggle_dance.gif")

Well done, we got through this without any bee puns.