166 min read

TWO DOGS IN TOILET ELDERLY LADY INVOLVED

Matt Dray

TL;DR

Animals get stuck in weird places. Oh, and the sf package in R can be used for coordinate reprojection prior to interactive mapping with leaflet.

The problem

I often work with data that has coordinates; usually a eastings and northings. I need to convert these to latitude and longitude for interactive mapping using the leaflet package in R. What might be an easy way to do this?

I’m going to demo the sf package, show how to reproject coordinates, work with points and polygon data, make an interactive map with leaflet and do a bonus bit of webscraping.

Mostly I wrote this post for my own reference and to learn a bit more about using the sf package. Feel free to discuss or suggest anything in the comments.

Special features

The sf (‘special features’) package from Edzer Pebesma has a series of classes and methods for spatial data. The package is becoming popular because simple feature access is a widely-used multi-platform ISO standard and the package supports the popular tidy data paradigm and can be integrated with tidyverse operations. This and more was spelled out by Pebesma in a post from UseR! 2017.

What does this mean for sp, the go-to package for spatial analysis in R? Well, its not going anywhere for now. You can get an opinion on whether you should learn sf or sp for spatial R programming from Seascape Models (spoiler: sf, but maybe both if you can).

Some data

Animal rescues

I’ve got an interesting example dataset that contains eastings and northings: animal rescue incidents attended by the London Fire Brigade 🚒 from the excellent London Data Store. In their words:

The London Fire Brigade attends a range of non-fire incidents (which we call ‘special services’). These ‘special services’ include assistance to animals that may be trapped or in distress.

This is pertinent because a pigeon fell down our chimney recently.

The back end of the trapped pigeon visible through a vent hole in our chimeny breast

Clean the data

First download the data from the datastore as a CSV file. Then we can read it and clean the column names.

library(readr)  # for reading data
library(dplyr)  # data manipulation and pipes (%>%)
library(janitor)  # cleaning and misc functions
library(lubridate)  # dealing with dates

# read the csv
rescue <- readr::read_csv("data/lfb-animal-rescue.csv")

# replace rogue '£' with blank
names(rescue) <- iconv(names(rescue), to = "ASCII", sub = "")

# more cleaning
rescue <- rescue %>% 
  rename(IncidentNominalCost = `IncidentNominalCost()`) %>%  # no brackets
  mutate(DateTimeOfCall = lubridate::ymd_hms(DateTimeOfCall))  # convert type

# preview
glimpse(rescue) 
## Observations: 5,352
## Variables: 25
## $ IncidentNumber             <int> 139091, 275091, 2075091, 2872091, 3...
## $ DateTimeOfCall             <dttm> 2001-01-20 09:03:01, 2001-01-20 09...
## $ CalYear                    <int> 2009, 2009, 2009, 2009, 2009, 2009,...
## $ FinYear                    <chr> "2008/09", "2008/09", "2008/09", "2...
## $ TypeOfIncident             <chr> "Special Service", "Special Service...
## $ PumpCount                  <chr> "1", "1", "1", "1", "1", "1", "1", ...
## $ PumpHoursTotal             <chr> "2", "1", "1", "1", "1", "1", "1", ...
## $ IncidentNominalCost        <chr> "652", "326", "326", "326", "326", ...
## $ FinalDescription           <chr> "DOG WITH JAW TRAPPED IN MAGAZINE R...
## $ AnimalGroupParent          <chr> "Dog", "Fox", "Dog", "Horse", "Rabb...
## $ OriginofCall               <chr> "Person (land line)", "Person (land...
## $ PropertyType               <chr> "House - single occupancy", "Railin...
## $ PropertyCategory           <chr> "Dwelling", "Outdoor Structure", "O...
## $ SpecialServiceTypeCategory <chr> "Other animal assistance", "Other a...
## $ SpecialServiceType         <chr> "Animal assistance involving livest...
## $ WardCode                   <chr> "E05000166", "E05000169", "E0500055...
## $ Ward                       <chr> "Upper Norwood", "Woodside", "Carsh...
## $ BoroughCode                <chr> "E09000008", "E09000008", "E0900002...
## $ Borough                    <chr> "Croydon", "Croydon", "Sutton", "Hi...
## $ StnGroundName              <chr> "Norbury", "Woodside", "Wallington"...
## $ PostcodeDistrict           <chr> "SE19", "SE25", "SM5", "UB9", "RM3"...
## $ Easting_m                  <chr> "NULL", "534785", "528041", "504689...
## $ Northing_m                 <chr> "NULL", "167546", "164923", "190685...
## $ Easting_rounded            <int> 532350, 534750, 528050, 504650, 554...
## $ Northing_rounded           <int> 170050, 167550, 164950, 190650, 192...

So each row is an ‘incident’ to which the brigade were called and there are various columns for when, what, and where.

Explore the data

This post isn’t about the dataset, but it’s worth having a closer look at it. I’ve added an interactive table below so you can filter and search for incidents, creatures, locations and the nominal cost of the callout.

Obviously there are plenty of cats up trees, but there’s so much more. Some of the more unique entries are:

DOG WITH JAW TRAPPED IN MAGAZINE RACK 🐶 📰

SWAN IN DISTRESS 🦆 😖

FERRET STUCK IN LIFT SHAFT [ferret emoji needed]

And of course:

TWO DOGS IN TOILET ELDERLY LADY INVOLVED 🐕 🐩 🚽 👵

Hmmm. 🤔

Get spatial with sf

Points data

Each incident in aour data is a point in space with an X and Y coordinate. It’s currently eastings and northings, but we want to transform it to latitude and longitude.

‘sf’ class and reprojection

A Coordinate Reference System (CRS) is required to project geographic entities – points, polygons, etc – onto a surface. There are many systems for doing this, from local to global, each with its own CRS code. This isn’t a post about geography and projections, but you can read more in the CRS chapter of rspatial.org.

First we convert the our dataframe-class object an ‘sf’ class object using the st_as_sf function, which readies it for spatial analysis. We simply provide arguments that point to the columns containing the coordinates and the CRS code for that projection system.

We can then use the st_transform() function to convert our projection system from eastings and northings to to latitude and longitude.

library(sf)

rescue_sf <- rescue %>% 
  sf::st_as_sf(
    coords = c("Easting_rounded", "Northing_rounded"),  # columns with coordinates
    crs = 27700  # coordinate reference system code for eastings/northings
  ) %>% 
  sf::st_transform(crs = 4326)  # the coord ref system code for latlong

print(rescue_sf)
## Simple feature collection with 5352 features and 23 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -0.5106219 ymin: 51.29753 xmax: 0.3017003 ymax: 51.68849
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 5,352 x 24
##    IncidentNumber DateTimeOfCall      CalYear FinYear TypeOfIncident 
##             <int> <dttm>                <int> <chr>   <chr>          
##  1         139091 2001-01-20 09:03:01    2009 2008/09 Special Service
##  2         275091 2001-01-20 09:08:51    2009 2008/09 Special Service
##  3        2075091 2004-01-20 09:10:07    2009 2008/09 Special Service
##  4        2872091 2005-01-20 09:12:27    2009 2008/09 Special Service
##  5        3553091 2006-01-20 09:15:23    2009 2008/09 Special Service
##  6        3742091 2006-01-20 09:19:30    2009 2008/09 Special Service
##  7        4011091 2007-01-20 09:06:29    2009 2008/09 Special Service
##  8        4211091 2007-01-20 09:11:55    2009 2008/09 Special Service
##  9        4306091 2007-01-20 09:13:48    2009 2008/09 Special Service
## 10        4715091 2007-01-20 09:21:24    2009 2008/09 Special Service
## # ... with 5,342 more rows, and 19 more variables: PumpCount <chr>,
## #   PumpHoursTotal <chr>, IncidentNominalCost <chr>,
## #   FinalDescription <chr>, AnimalGroupParent <chr>, OriginofCall <chr>,
## #   PropertyType <chr>, PropertyCategory <chr>,
## #   SpecialServiceTypeCategory <chr>, SpecialServiceType <chr>,
## #   WardCode <chr>, Ward <chr>, BoroughCode <chr>, Borough <chr>,
## #   StnGroundName <chr>, PostcodeDistrict <chr>, Easting_m <chr>,
## #   Northing_m <chr>, geometry <POINT [°]>

So what happened more specifically? Our eastings and northings were combined with st_as_sf() into a two element list-column called geometry with data type sfc_POINT. Our new sf-class object also contains some metadata detailing the geometry type – POINTS in our case – and the projection system of the coordinate data, which we converted to latlong with the st_transform() function.

Tidyverse manipulation

As mentioned, one advantage of using sf is that sf-class objects use the tidy data paradigm that allows for use of the tidyverse. Som eusers may prefer this relative to the ‘Spatial Points Data Frame’ (SPDF) class that is produced by the sp package for points data. SPDFs are an S4 class, which means they have ‘slots’ of data, coordinates, etc.

In the code chunk above I used pipes to pass the rescue dataframe to st_as_sf() and then to st_transform(). You can also use dplyr functions like filter() on your sf-class object.

filtered_sf <- rescue_sf %>%
  dplyr::filter(
    AnimalGroupParent %in% c("Dog", "Cat", "Bird"),  # just these animals
    DateTimeOfCall > lubridate::ymd("2017-01-01") &
      DateTimeOfCall < lubridate::ymd("2017-12-31")  # 2017 only
  )

We can also use the st_coordinates() function to extract the XY (latitude and longitude) coordinates from the column containing our sf point geometry. This means we can have separate columns for the latitude and longitude, as well as our list-column.

# extract coordinates into two-column dataframe
rescue_coords <- as.data.frame(st_coordinates(filtered_sf))

# bind these to rescue dataset
filtered_sf <- dplyr::bind_cols(filtered_sf, rescue_coords)

# preview
head(filtered_sf)
## Simple feature collection with 6 features and 25 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -0.3653023 ymin: 51.47668 xmax: 0.1769739 ymax: 51.58195
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 6 x 26
##   IncidentNumber DateTimeOfCall      CalYear FinYear TypeOfIncident 
##            <int> <dttm>                <int> <chr>   <chr>          
## 1       10299091 2017-01-20 09:11:40    2009 2008/09 Special Service
## 2       10470091 2017-01-20 09:18:08    2009 2008/09 Special Service
## 3       27805091 2017-02-20 09:13:23    2009 2008/09 Special Service
## 4       27898091 2017-02-20 09:16:13    2009 2008/09 Special Service
## 5       63575091 2017-04-20 09:16:12    2009 2009/10 Special Service
## 6       63759091 2017-04-20 09:22:04    2009 2009/10 Special Service
## # ... with 21 more variables: PumpCount <chr>, PumpHoursTotal <chr>,
## #   IncidentNominalCost <chr>, FinalDescription <chr>,
## #   AnimalGroupParent <chr>, OriginofCall <chr>, PropertyType <chr>,
## #   PropertyCategory <chr>, SpecialServiceTypeCategory <chr>,
## #   SpecialServiceType <chr>, WardCode <chr>, Ward <chr>,
## #   BoroughCode <chr>, Borough <chr>, StnGroundName <chr>,
## #   PostcodeDistrict <chr>, Easting_m <chr>, Northing_m <chr>, X <dbl>,
## #   Y <dbl>, geometry <POINT [°]>

Polygons data

Read and convert GeoJSON

While we’re messing around with the sf package we can investigate polygons by adding in the borders for each of the London boroughs from a GeoJSON file.

First, read Local Authority District (LADs) data directly from the Open Geography Portal and then use the st_as_sf() function to make the conversion from the sp class to the sf class.

This means our polygon dataset is a tidy dataframe with the polygon information stored as MULTIPOLYGON type in a list-column with as many elements as required to draw each polygon (e.g. a square requires four sets of XY points that can be joined together). The outcome is very similar to what we had for our points data.

library(geojsonio)  # for working with geojson files

# read geojson from the web (local authorities)
lad_json <- geojsonio::geojson_read(
  x = "https://opendata.arcgis.com/datasets/fab4feab211c4899b602ecfbfbc420a3_4.geojson",
  what = "sp"  # spatial class
)

# convert to sf
lad_sf <- sf::st_as_sf(lad_json)

# preview
head(lad_sf)
## Simple feature collection with 6 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -2.83382 ymin: 53.30631 xmax: -0.7913148 ymax: 54.72728
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##   objectid   lad17cd              lad17nm lad17nmw  bng_e  bng_n     long
## 0        1 E06000001           Hartlepool          447157 531476 -1.27023
## 1        2 E06000002        Middlesbrough          451141 516887 -1.21099
## 2        3 E06000003 Redcar and Cleveland          464359 519597 -1.00611
## 3        4 E06000004     Stockton-on-Tees          444937 518183 -1.30669
## 4        5 E06000005           Darlington          428029 515649 -1.56835
## 5        6 E06000006               Halton          354246 382146 -2.68853
##        lat st_areashape st_lengthshape                       geometry
## 0 54.67616     96586817       50245.93 MULTIPOLYGON (((-1.243848 5...
## 1 54.54467     54741668       35458.51 MULTIPOLYGON (((-1.200218 5...
## 2 54.56752    247140467       78666.80 MULTIPOLYGON (((-1.200218 5...
## 3 54.55691    206473805       86947.34 MULTIPOLYGON (((-1.193937 5...
## 4 54.53535    198298966       91341.12 MULTIPOLYGON (((-1.43994 54...
## 5 53.33424     80971982       59054.62 MULTIPOLYGON (((-2.695155 5...

Scrape London borough list

But which of these LADs are London boroughs? We can extract a vector of the boroughs from Wikipedia using the rvest package and use it to filter our data. The CSS selector used in the html_nodes() function below can be extracted using the very handy SelectorGadget tool.

library(rvest)  # scraping webpages
library(xml2)  # parsing XML

# read the webpage
wiki <- xml2::read_html("https://en.wikipedia.org/wiki/List_of_London_boroughs")

# get borough names
boroughs <- wiki %>% 
  rvest::html_nodes(css = "td:nth-child(1) > a") %>%
  rvest::html_text() %>% 
  tolower()

# filter the sf for london boroughs
boroughs_sf <- lad_sf %>% 
  mutate(lad17nm = tolower(lad17nm)) %>% 
  dplyr::filter(lad17nm %in% boroughs)

# preview
head(boroughs_sf)
## Simple feature collection with 6 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -0.3371469 ymin: 51.29222 xmax: 0.2197077 ymax: 51.67066
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##   objectid   lad17cd              lad17nm lad17nmw  bng_e  bng_n      long
## 1      297 E09000004               bexley          549202 175433  0.146212
## 2      294 E09000001       city of london          532382 181358 -0.093510
## 3      295 E09000002 barking and dagenham          547759 185107  0.129506
## 4      296 E09000003               barnet          523472 191753 -0.218210
## 5      298 E09000005                brent          519615 186468 -0.275680
## 6      299 E09000006              bromley          542036 165708  0.039246
##        lat st_areashape st_lengthshape                       geometry
## 1 51.45821     62394091      37331.781 MULTIPOLYGON (((0.2197077 5...
## 2 51.51564      3081566       7498.317 MULTIPOLYGON (((-0.08177556...
## 3 51.54552     36017227      30423.228 MULTIPOLYGON (((0.1577779 5...
## 4 51.61108     86530595      41634.761 MULTIPOLYGON (((-0.1836922 ...
## 5 51.56441     43881295      32897.293 MULTIPOLYGON (((-0.2150751 ...
## 6 51.37267    148885903      63088.815 MULTIPOLYGON (((0.07374317 ...

Map it

So now we can plot the data. The addPolygons() function accepts each MULTIPOLYGON in the list-column of our London boroughs dataset. The addMarkers() function accepts the POINTS-type list-column to plot each point at the latitude and longitude specified.

This is a very simple map for now. You can:

  • zoom with the + and - buttons or scroll with your mouse wheel
  • click and drag to move the map
  • click the points to get a pop-up showing more information
  • hover over a borough to highlight it and show a label
library(leaflet)  # for interactive mapping

# put the map together
leaflet::leaflet(boroughs_sf) %>% 
  leaflet::addTiles() %>%
  leaflet::addPolygons(  # add London borough boundaries
    label = ~tools::toTitleCase(lad17nm),  # label on hover
    color = "black",  # boundary colour
    weight = 2,  # boundary thickness
    opacity = 1,  # fully opaque lines
    fillOpacity = 0.2,  # mostly transparent
    highlightOptions = highlightOptions(
      color = "white",  # turn boundary white on hover
      weight = 2,  # same as polygon boundary
      bringToFront = TRUE  # overlay the highglight
    )
  ) %>% 
  leaflet::addAwesomeMarkers( # add incident points
    lng = filtered_sf$X, lat = filtered_sf$Y,
    icon = leaflet::awesomeIcons(
      library = "ion",  # from this icon library
      icon = "ion-android-alert",  # use this icon
      iconColor = "white",  # colour it white
      # colour by animal
      markerColor = case_when(  # different colours for each
        filtered_sf$AnimalGroupParent == "Dog" ~ "red",
        filtered_sf$AnimalGroupParent == "Cat" ~ "blue",
        filtered_sf$AnimalGroupParent == "Bird" ~ "black"
      )
    ),
    popup = ~paste0(  # display this information on click
      "<b>Animal</b>: ", filtered_sf$AnimalGroupParent, "<br>",
      "<b>Incident</b>: ", filtered_sf$FinalDescription, "<br>",
      "<b>Date</b>: ", filtered_sf$DateTimeOfCall, "<br>",
      "<b>Borough</b>: ", filtered_sf$Borough, "<br>",
      "<b>Nominal cost</b>: £", filtered_sf$IncidentNominalCost
    )
  )

Okay cool, we get a simple map of London with borough boundaries and markers showing incidents in 2017, with a different colour for each of three selected animal groups (red = dog, blue = cat, black = bird).

And remember

My main advice? Keep an eye on your pets. And consider covering your chimney.

R session information

sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.4
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
##  [1] leaflet_1.1.0   rvest_0.3.2     xml2_1.2.0      geojsonio_0.5.0
##  [5] sf_0.6-1        bindrcpp_0.2    lubridate_1.7.2 janitor_0.3.0  
##  [9] dplyr_0.7.4     readr_1.1.1    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.17     lattice_0.20-35  class_7.3-14     assertthat_0.2.0
##  [5] rprojroot_1.2    digest_0.6.15    utf8_1.1.3       V8_1.5          
##  [9] mime_0.5         R6_2.2.2         backports_1.1.1  evaluate_0.10.1 
## [13] e1071_1.6-8      httr_1.3.1       geojson_0.2.0    blogdown_0.1    
## [17] pillar_1.2.1     rlang_0.2.1      lazyeval_0.2.1   curl_3.0        
## [21] rstudioapi_0.7   DT_0.4.5         rmarkdown_1.6    rgdal_1.2-15    
## [25] jqr_1.0.0        udunits2_0.13    stringr_1.3.0    selectr_0.3-1   
## [29] foreign_0.8-69   htmlwidgets_1.0  shiny_1.1.0      compiler_3.4.3  
## [33] httpuv_1.4.3     pkgconfig_2.0.1  rgeos_0.3-26     htmltools_0.3.6 
## [37] tibble_1.4.2     bookdown_0.5     emo_0.0.0.9000   XML_3.98-1.9    
## [41] crayon_1.3.4     later_0.7.2      grid_3.4.3       spData_0.2.8.3  
## [45] jsonlite_1.5     xtable_1.8-2     DBI_0.8          magrittr_1.5    
## [49] units_0.5-1      cli_1.0.0        stringi_1.1.7    promises_1.0.1  
## [53] sp_1.2-5         tools_3.4.3      glue_1.2.0       purrr_0.2.4     
## [57] hms_0.3          crosstalk_1.0.1  yaml_2.1.18      maptools_0.9-2  
## [61] classInt_0.2-3   knitr_1.18       bindr_0.1