Matt Dray (@mattdray)

# TL;DR

I made a Shiny app to demonstrate the Six Degrees of Kevin Bacon. Except it’s for CRAN authors. And Hadley Wickham is Kevin Bacon.

To help do this, I made the little package {kevinbacran}^{1} to find the network separation between any two authors on CRAN.

# Six degrees

People are connected to each other in networks. What is the average separation of any two people in the network? There’s a popular idea of there being six degrees of separation.

Instead of separation between any two people, we can measure separation from one fixed person. For example, we can calculate a Bacon Number for actors connected to Kevin Bacon, ‘centre of the entertainment universe’.

It works like this: you have a Bacon Number of zero if you are Kevin Bacon (hi Kevin, thanks for reading my blog). You have a Bacon Number of one if you were in a film with him. Your number is two if you were in a film with someone who was in a film with Kevin Bacon.

A more classic example is the Erdős Number, which expresses the separation of mathematicians from the prolific Paul Erdős via published academic papers. And yes, there’s an Erdős-Bacon Number for actor-mathematicians.^{2}

We can extend this approach to any network. One nerdy example is shared authorship on packages published on the Comprehensive R Archive Network (CRAN), the go-to repository for packages for the R programming language. {kevinbacran} can help you do this.

# The {kevinbacran} package

You can learn about the package on the {kevinbacran} site^{3}, see the code on GitHub and feel free to leave an issue.

It is currently incomplete, potentially unstable, inefficient and untested. It’s main purpose was to sate curiosity and provide some helper functions for the Shiny app.

The package has only four functions^{4}:

Function | Description |
---|---|

`kb_combos()` |
Fetch CRAN data, clean author names, get author combos per package, create network graph object |

`kb_pair()` |
Gets network graph of the shortest distance between two authors from the `kb_combos()` graph |

`kb_distance()` |
Separation (number of edges) between authors in `kb_pair()` |

`kb_plot()` |
Returns a {ggraph} plot from the `kb_pair()` object |

The package relies heavily on others, particularly {cranly} from Ioannis Kosmidis, {tidygraph} and {ggraph} from Thomas Lin Pedersen, and {dplyr} and {purrr} from Hadley Wickham, Lionel Henry, Romain François and Kirill Müller. The code for getting author combinations per package is from Duncan Garmonsway^{5}.

# The app

We could use the functions mentioned to obtain graphs between all authors and a single named author of our choosing. We’re going to use Hadley Wickham as the target, since he is among the most named authors on CRAN. This may be largely explained by his involvement in the tidyverse suite of packages and their use in packages maintained by other authors.

Here is an embedded version of the Shiny app, but you can access it in full from its own page on shinyapps.io. Just type an author name and hit go. You’ll get the Hadley Number and a graph to represent a shortest path between them.

You may notice:

- that your name is missing (perhaps you’re not on CRAN, or a shortest path could not be reached)
- the author names look weird or the same people are listed under variant names ({cranly} is excellent at cleaning names, but the author field is very unstructured; just ask Duncan Garmonsway
^{6}) - that some of the labels overlap and are hard to read (try hitting the Go button again)

You are very welcome to use, improve or ignore the code for the app on GitHub, where you can leave issues.

# Read next

The purpose of this post and the app were to give a flavour of the possibilities for {kevinbacran}. Of course, graph theory is a whole area of study and I haven’t incoporated any analysis of the characteristics of the CRAN network here (e.g. measures of centrality or detection of communities).

Fortunately, Duncan Garmonsway’s blogpost ‘With added bacran’ covers:

- Who has the highest Hadley number?
- What is the longest ‘shortest path’ between any two CRAN authors?
- What is the largest network disconnected from Hadley?
- Is Hadley the most central author?

Also, if you choose to use your Hadley Number to gain street cred, you may be interested in Robin Edwards’s Hadey Index repo:

How early did you start following Hadley Wickham? Can be used as a last resort to resolve R arguments.

As in Kevin Bacon + CRAN, lol.↩

And yes, there’s an Erdős-Bacon-Sabbath Number for mathematician-actor-musicians.↩

Thanks to the marvellous {pkgdown} package from Hadley Wickham and Jay Hesselberth.↩

Subject to change.↩

Originally used code from William Chase’s blog.↩

A post that uses a similar approach to getting at author names to calculate an h-index for package authors.↩