12 min read

The NBA Northwest Division with gmapsdistance

Matt Dray (@mattdray)

Classic Jazz: Stockton to Malone for the dunk (via Giphy)

Classic Jazz: Stockton to Malone for the dunk (via Giphy)

TL;DR

The gmapsdistance R package lets you query the Google Maps API. I used it to look at travel distances between basketball teams. Surprise: they can be far apart even if they’re in the same division.

The NBA

The USA is pretty big.

So the National Basketball Association (NBA) is separated geographically into Eastern and Western conferences, each with three divisions of five teams. The majority of match-ups are within the division and conference.

This is fine for the Central division: teams clustered near Lakes Michigan and Erie.

But not for the Northwest division: it stretches from Portland in the Pacific Northwest to Oklahoma City in the centre-south, with Salt Lake City and Denver between. The other team is in Minneapolis; much closer to the Central division.

NBA team locations with states coloured by division (CC BY-SA 2.5, Wikipedia)

NBA team locations with states coloured by division (CC BY-SA 2.5, Wikipedia)

That’s a lot of travel for fans.

Google Maps and gmapsdistance

If you didn’t know, Google Maps is one of many services that calculates journey distances and durations. The R package gmapsdistance was written by Rodrigo Azuero, Demetrio Rodriguez and David Zarruk. It wraps the Google Maps Application Programming Interface (API). You can use it to get travel distance and time for several modes of travel.

You need an API key for Google Maps. This helps prevent misuse of the service and helps keeps track of your credit. The process is outlined in detail in the Google developer documentation pages. Go to the Google Cloud maps platform pages and follow the instructions. You’ll need a Google account to do this.

Setting up Google Maps platform on Google Cloud

Setting up Google Maps platform on Google Cloud

Get data

We can scrape the NBA Wikipedia page with the rvest package to get a table of teams and locations.

Start by loading rvest and some other packages for data manipulation.

library(dplyr)  # data manipulation and %>%
library(tidyr)  # tidying dataframes
library(stringr); library(stringi)  # string manipulation
library(rvest)  # web scraping
library(knitr)  # print tables nicely

We can extract the table with its xpath, which is like an ‘address’ for HTML content. You can use the ‘inspect’ function in your browser or the selectorgadget browser extension to get this information.

# Read the HTML from a specified web URL
html <- read_html("https://en.wikipedia.org/wiki/National_Basketball_Association")

# Use selectorgadget or your browser's 'inspect' mode to get xpath
table_xpath <- '//*[@id="mw-content-text"]/div/table[4]'

Now we can collect the table and get the parts we want: team names and locations. We can get the latter as both latitude-longitude coordinates and as text. Text can be parsed by Google Maps if words are separated by the plus symbol: arena+city+state. Steps to do this are annotated in the code below.

nba_table <-
  # Read HTML, isolate table, focus on NW
  html_nodes(x = html, xpath = table_xpath) %>%  # scrape the HTML
  html_table(fill = TRUE, header = NA) %>%  # parse as table
  as.data.frame() %>%  # convert from list
  filter(Division == "Northwest") %>%  # NW division only
  # Isolate the latitude and longitude
  separate(Coordinates, c("Coords1", "Coords2", "Coords3"), " / ") %>% 
  separate(Coords3, c("Latitude", "Longitude"), sep = "; ") %>% 
  separate(Longitude, c("Longitude", "X"), sep = " \\(") %>% 
  mutate(
    # Make coordinates numeric
    Latitude = as.numeric(Latitude),  # make numeric
    Longitude = stri_escape_unicode(Longitude),  # there's a weird character
    Longitude = str_replace(Longitude, "\\\\ufeff", ""),  # remove it
    Longitude = as.numeric(Longitude),  # make numeric
    # Create text search string
    Search = paste(Arena, City.State),  # combine location details
    Search = str_replace_all(Search, ",", ""),  # replace comma with blank
    Search = str_replace_all(Search, " ", "+"),  # replace space with plus
    # Add three-letter team codes to help simplify outputs
    `Team code` = case_when(
      Team == "Denver Nuggets" ~ "DEN",
      Team == "Minnesota Timberwolves" ~ "MIN",
      Team == "Oklahoma City Thunder" ~ "OKC",
      Team == "Portland Trail Blazers" ~ "POR",
      Team == "Utah Jazz" ~ "UTA"
    )
  ) %>% 
  select(  # Retain columns of interest only
    Team, `Team code`,
    Arena, City = City.State,
    Latitude, Longitude, Search
  )

# Print a simple version of the table
select(nba_table, `Team code`, Latitude, Longitude, Search) %>% kable()
Team code Latitude Longitude Search
DEN 39.74861 -105.00750 Pepsi+Center+Denver+Colorado
MIN 44.97944 -93.27611 Target+Center+Minneapolis+Minnesota
OKC 35.46333 -97.51500 Chesapeake+Energy+Arena+Oklahoma+City+Oklahoma
POR 45.53167 -122.66667 Moda+Center+Portland+Oregon
UTA 40.76833 -111.90111 Vivint+Smart+Home+Arena+Salt+Lake+City+Utah

Map locations

We can map the latitude and longitude for fun.

library(leaflet)  # for interactive maps

leaflet(nba_table) %>%
  fitBounds(lng1 = -125, lng2 = -90, lat1 = 34, lat2 = 46) %>% 
  addProviderTiles(providers$Stamen.TonerLite) %>%
  addAwesomeMarkers(
    lng = ~Longitude, lat = ~Latitude,
    popup = ~paste0(
      "<b>", nba_table$Team, "</b>",
      "<br>", nba_table$Arena,
      "<br>", nba_table$City
    ),
    icon = awesomeIcons(
      library = "ion", icon = "ion-ios-basketball", # add icon
      markerColor = case_when(
        nba_table$`Team code` == "DEN" ~ "darkblue",
        nba_table$`Team code` == "MIN" ~ "darkblue",
        nba_table$`Team code` == "OKC" ~ "blue",
        nba_table$`Team code` == "POR" ~ "black",
        nba_table$`Team code` == "UTA" ~ "darkblue"
      ),
      iconColor = case_when(  # see teamcolorcodes.com
        nba_table$`Team code` == "DEN" ~ "#FEC524",
        nba_table$`Team code` == "MIN" ~ "#9EA2A2",
        nba_table$`Team code` == "OKC" ~ "#EF3B24",
        nba_table$`Team code` == "POR" ~ "#E03A3E",
        nba_table$`Team code` == "UTA" ~ "#00471B"
      )
    )
  ) %>%
  addMeasure()

Get travel distances

We start by making our API key available in the package environment by using the set.api.key() function from gmapsdistance.

library(gmapsdistance)
set.api.key("YOUR-API-KEY")  # replace with your key

The basic arguments to the gmapsdistance() function are the origin and destination (can be an address, postcode or latlong coordinates), the mode of travel (car, public transit, walking) and the return format (shape) of the data (each origin-destination pair per row, or a matrix). We want all possible combinations.

Now we can query Google Maps with the API key. I’ve explained the function arguments in the code comments.

nba_travel <- 
  gmapsdistance(
    key = get.api.key(),  # retrieve the API key we set in the step above
    origin = nba_table$Search,  # vector of locations from our dataframe
    destination = nba_table$Search,  # as above
    combinations = "all",  # all possible pairs of locations
    mode = "driving",  # rather than walking, public transport or cycling
    shape = "long"  # one row per location-pair
  )

I’ve focused here on driving as the mode of transport. Note that you can also do things like set specific start times and dates, but I’ve left these out for simplicity.

Travel distance output

What does the returned data look like?

glimpse(nba_travel)
## List of 3
##  $ Time    :'data.frame':    25 obs. of  3 variables:
##   ..$ or  : Factor w/ 5 levels "Pepsi+Center+Denver+Colorado",..: 1 2 3 4 5 1 2 3 4 5 ...
##   ..$ de  : Factor w/ 5 levels "Pepsi+Center+Denver+Colorado",..: 1 1 1 1 1 2 2 2 2 2 ...
##   ..$ Time: num [1:25] 0 47277 34477 65601 27969 ...
##  $ Distance:'data.frame':    25 obs. of  3 variables:
##   ..$ or      : Factor w/ 5 levels "Pepsi+Center+Denver+Colorado",..: 1 2 3 4 5 1 2 3 4 5 ...
##   ..$ de      : Factor w/ 5 levels "Pepsi+Center+Denver+Colorado",..: 1 1 1 1 1 2 2 2 2 2 ...
##   ..$ Distance: num [1:25] 0 1471634 1091108 1997198 835730 ...
##  $ Status  :'data.frame':    25 obs. of  3 variables:
##   ..$ or    : Factor w/ 5 levels "Pepsi+Center+Denver+Colorado",..: 1 2 3 4 5 1 2 3 4 5 ...
##   ..$ de    : Factor w/ 5 levels "Pepsi+Center+Denver+Colorado",..: 1 1 1 1 1 2 2 2 2 2 ...
##   ..$ status: chr [1:25] "OK" "OK" "OK" "OK" ...

It’s a list object with three elements: Time, Distance and Status. Each list element is a dataframe with 25 observations (each team paired with every other) and 3 columns:

  • or – the origin point
  • de – the destination point
  • one of Time in seconds, Distance in metres or Status (whether the request was successful)

All our requests returned ‘OK’ for Status so there were no problems with fetching the data.

Exploring results

The data is returned in a tidy data frame with one row per origin-destination. We can clean this up by matching three-letter team codes to the data, arranging by distance within origin location and removing rows that provide the distance from each team to themselves (i.e. zero kilometres).

# Create a mini-lookup of search strings to three-letter team codes
nba_lookup <- select(nba_table, `Team code`, Search)

# Make the output table easier to read by joining team codes
nba_distance <- nba_travel$Distance %>%
  left_join(nba_lookup, by = c("or" = "Search")) %>% 
  left_join(
    nba_lookup, by = c("de" = "Search"), suffix = c(" (origin)", " (destination)")
  ) %>% 
  mutate(Distance = round(Distance/1000, 1)) %>% # m to km
  select(`Team code (origin)`, `Team code (destination)`, Distance)

# View table
nba_distance %>%
  arrange(`Team code (origin)`, desc(Distance)) %>% 
  filter(Distance != 0) %>%
  rename(`Distance (km)` = Distance) %>% 
  kable()
Team code (origin) Team code (destination) Distance (km)
DEN POR 1996.7
DEN MIN 1471.4
DEN OKC 1090.5
DEN UTA 836.1
MIN POR 2781.2
MIN UTA 1984.8
MIN DEN 1471.6
MIN OKC 1270.4
OKC POR 3072.2
OKC UTA 1911.6
OKC MIN 1266.9
OKC DEN 1091.1
POR OKC 3072.9
POR MIN 2782.0
POR DEN 1997.2
POR UTA 1230.1
UTA MIN 2002.8
UTA OKC 1911.4
UTA POR 1228.9
UTA DEN 835.7

Travel distance summaries

It’s trivial to summarise the furthest travel distance within the Northwest division for each team in that division.

nba_distance %>% 
  arrange(`Team code (origin)`, desc(Distance)) %>%
  group_by(`Team code (origin)`) %>% 
  slice(1) %>% 
  ungroup() %>% 
  arrange(desc(Distance)) %>% 
  rename(`Distance (km)` = Distance) %>% 
  kable()
Team code (origin) Team code (destination) Distance (km)
POR OKC 3072.9
OKC POR 3072.2
MIN POR 2781.2
UTA MIN 2002.8
DEN POR 1996.7

Similarly, we can get the total distance each team must travel to each of the others in the Northwest Division.

nba_distance %>% 
  group_by(`Team code (origin)`) %>% 
  summarise(`Total distance (km)` = sum(Distance)) %>% 
  arrange(desc(`Total distance (km)`)) %>% 
  kable()
Team code (origin) Total distance (km)
POR 9082.2
MIN 7508.0
OKC 7341.8
UTA 5978.8
DEN 5394.7

So fans of the Portland Trail Blazers have the furthest total distance to travel to each of the other teams in the Northwest: over 9000 km. That’s about the same distance as London to Johannesburg in a straight line. Bear in mind these totals are only the travel in one direction as well.

Travel distance lookup

Another way to view this is to ‘spread’ the data from long to wide format to create a look-up between origin teams (rows) and destination teams (columns).

# Create easy look-up table for travel distances between teams
nba_distance %>% 
  spread(key = `Team code (destination)`, value = Distance) %>% 
  rename(`Travel distance (km)` = `Team code (origin)`) %>% 
  kable()
Travel distance (km) DEN MIN OKC POR UTA
DEN 0.0 1471.4 1090.5 1996.7 836.1
MIN 1471.6 0.0 1270.4 2781.2 1984.8
OKC 1091.1 1266.9 0.0 3072.2 1911.6
POR 1997.2 2782.0 3072.9 0.0 1230.1
UTA 835.7 2002.8 1911.4 1228.9 0.0

So Portland Trail Blazers and Oklahoma City Thunder have a driving travel distance of over 3000 km. That’s a similar distance as from London to Longyearbyen on Svalbard.

Travel time lookup

We can also create a team-to-team lookup for travel time.

nba_travel$Time %>%
  left_join(nba_lookup, by = c("or" = "Search")) %>% 
  left_join(
    nba_lookup, by = c("de" = "Search"), suffix = c(" (origin)", " (destination)")
  ) %>% 
  mutate(Time = round(Time/60/60), 1) %>% # s to hrs
  select(`Team code (origin)`, `Team code (destination)`, Time) %>% 
  spread(key = `Team code (destination)`, value = Time) %>% 
  rename(`Travel time (hours)` = `Team code (origin)`) %>% 
  kable()
Travel time (hours) DEN MIN OKC POR UTA
DEN 0 13 10 18 8
MIN 13 0 11 25 18
OKC 10 11 0 28 17
POR 18 25 27 0 11
UTA 8 18 17 11 0

So more than half a day’s drive for many of the journeys and more than a day’s drive between Portland and Oklahoma City.

Recap

We:

Session info

devtools::session_info()
## Session info -------------------------------------------------------------
##  setting  value                       
##  version  R version 3.5.1 (2018-07-02)
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_GB.UTF-8                 
##  tz       Europe/London               
##  date     2018-12-24
## Packages -----------------------------------------------------------------
##  package       * version   date       source        
##  assertthat      0.2.0     2017-04-11 CRAN (R 3.5.0)
##  backports       1.1.2     2017-12-13 CRAN (R 3.5.0)
##  base          * 3.5.1     2018-07-05 local         
##  bindr           0.1.1     2018-03-13 CRAN (R 3.5.0)
##  bindrcpp      * 0.2.2     2018-03-29 CRAN (R 3.5.0)
##  bitops          1.0-6     2013-08-17 CRAN (R 3.5.0)
##  blogdown        0.7       2018-07-07 CRAN (R 3.5.0)
##  bookdown        0.7       2018-02-18 CRAN (R 3.5.0)
##  compiler        3.5.1     2018-07-05 local         
##  crayon          1.3.4     2017-09-16 CRAN (R 3.5.0)
##  crosstalk       1.0.0     2016-12-21 CRAN (R 3.5.0)
##  curl            3.2       2018-03-28 CRAN (R 3.5.0)
##  datasets      * 3.5.1     2018-07-05 local         
##  devtools        1.13.6    2018-06-27 CRAN (R 3.5.0)
##  digest          0.6.18    2018-10-10 cran (@0.6.18)
##  dplyr         * 0.7.8     2018-11-10 CRAN (R 3.5.0)
##  evaluate        0.10.1    2017-06-24 CRAN (R 3.5.0)
##  glue            1.3.0     2018-07-17 CRAN (R 3.5.0)
##  gmapsdistance * 3.4       2018-08-28 CRAN (R 3.5.0)
##  graphics      * 3.5.1     2018-07-05 local         
##  grDevices     * 3.5.1     2018-07-05 local         
##  highr           0.7       2018-06-09 CRAN (R 3.5.0)
##  htmltools       0.3.6     2017-04-28 CRAN (R 3.5.0)
##  htmlwidgets     1.2       2018-04-19 CRAN (R 3.5.0)
##  httpuv          1.4.4.2   2018-07-02 CRAN (R 3.5.0)
##  httr            1.3.1     2017-08-20 CRAN (R 3.5.0)
##  jsonlite        1.5       2017-06-01 CRAN (R 3.5.0)
##  knitr         * 1.20      2018-02-20 CRAN (R 3.5.0)
##  later           0.7.3     2018-06-08 CRAN (R 3.5.0)
##  leaflet       * 2.0.2     2018-08-27 CRAN (R 3.5.0)
##  magrittr        1.5       2014-11-22 CRAN (R 3.5.0)
##  memoise         1.1.0     2017-04-21 CRAN (R 3.5.0)
##  methods       * 3.5.1     2018-07-05 local         
##  mime            0.6       2018-10-05 cran (@0.6)   
##  pillar          1.3.0     2018-07-14 CRAN (R 3.5.0)
##  pkgconfig       2.0.2     2018-08-16 cran (@2.0.2) 
##  promises        1.0.1     2018-04-13 CRAN (R 3.5.0)
##  purrr           0.2.5     2018-05-29 CRAN (R 3.5.0)
##  R6              2.3.0     2018-10-04 cran (@2.3.0) 
##  Rcpp            1.0.0     2018-11-07 CRAN (R 3.5.0)
##  RCurl           1.95-4.11 2018-07-15 CRAN (R 3.5.0)
##  rlang           0.3.0.1   2018-10-25 CRAN (R 3.5.0)
##  rmarkdown       1.10      2018-06-11 CRAN (R 3.5.0)
##  rprojroot       1.3-2     2018-01-03 CRAN (R 3.5.0)
##  rvest         * 0.3.2     2016-06-17 CRAN (R 3.5.0)
##  selectr         0.4-1     2018-04-06 CRAN (R 3.5.0)
##  shiny           1.1.0     2018-05-17 CRAN (R 3.5.1)
##  stats         * 3.5.1     2018-07-05 local         
##  stringi       * 1.2.4     2018-07-20 CRAN (R 3.5.0)
##  stringr       * 1.3.1     2018-05-10 CRAN (R 3.5.0)
##  tibble          1.4.2     2018-01-22 CRAN (R 3.5.0)
##  tidyr         * 0.8.2     2018-10-28 CRAN (R 3.5.0)
##  tidyselect      0.2.5     2018-10-11 CRAN (R 3.5.0)
##  tools           3.5.1     2018-07-05 local         
##  utils         * 3.5.1     2018-07-05 local         
##  withr           2.1.2     2018-03-15 CRAN (R 3.5.0)
##  xfun            0.3       2018-07-06 CRAN (R 3.5.0)
##  XML             3.98-1.16 2018-08-19 CRAN (R 3.5.0)
##  xml2          * 1.2.0     2018-01-24 CRAN (R 3.5.0)
##  xtable          1.8-2     2016-02-05 CRAN (R 3.5.0)
##  yaml            2.1.19    2018-05-01 CRAN (R 3.5.0)