R's names and values as anchovy pizza

Bird's-eye view of five large margherita pizzas on a table before and after consumption

Queued two hours for this. R’s names and values system is faster to learn, but not as delicious.

tl;dr

I bought Hadley Wickham’s Advanced R book1 to help me better understand R’s quirks. Can names and values (chapter 2) be explained with a contrived pizzeria analogy?2

A pizza by any other name

Welcome to the pizzeria. It’s called ‘La PizzRia’ because our owner likes to code and is really lazy at puns.

Toppings as vectors

Our specialty (and only!) pizza is pizza alla napoletana, which is topped with mozzarella, tomatoes and anchovies.

# Create a character-vector object
napoletana <- c("mozzarella", "tomato", "anchovy")

The English version of the menu calls it ‘Neapolitan’ pizza, but it’s the same thing.

neapolitan <- napoletana       # copy the object
all(neapolitan == napoletana)  # they're equal
## [1] TRUE

We store our unique sets of pizza toppings in a special recipe book. If you look up ‘napoletana’ and ‘Neapolitan’ in the book’s index, you’ll see they point to the same recipe.

# The {lobstr} package helps understand object structure
library(lobstr)  # after install.packages("lobstr")

# Get the specific object 'address' in your computer's memory
# Both names point to the same object
obj_addr(napoletana)  # original object
## [1] "0x7fd9967f00f8"
obj_addr(neapolitan)  # the copy
## [1] "0x7fd9967f00f8"

Basically, the pizzaiolos don’t care: different names, same pizza. The recipe codes are the same.

Advanced R, p19
“The object, or value, doesn’t have a name; it’s actually the name that has a value.”

Copying a recipe, modifying it

We recently added pizza pugliese to the menu. We copied our napoletana in the recipe book and then modified it to have onions instead of anchovies.

pugliese <- napoletana       # copy the object
all(pugliese == napoletana)  # the objects are the same
## [1] TRUE
pugliese[[3]] <- "onion"  # modify the third element
pugliese == napoletana    # they're no longer the same
## [1]  TRUE  TRUE FALSE

When we look up these names in the index of our recipe book, we see that they point to different places, despite having copied the napoletana to get the pugliese.

# Now the names point to different objects
# We modified the copy, so it becomes a new object in memory
obj_addr(napoletana)  # original object
## [1] "0x7fd9967f00f8"
obj_addr(pugliese)    # the modified copy
## [1] "0x7fd9973f7c78"

Advanced R, p22
“This behaviour is called copy-on-modify.”

So, here’s our full pizza lineup in Italian and English.

apulian <- pugliese  # specify English name for the pugliese

# A comparison of the pizza object structures
knitr::kable(
  tibble::tribble(
    ~Language, ~Name, ~`Toppings`, ~`Recipe code`, 
    "ITA", "Pizza alla napoletana", napoletana, obj_addr(napoletana),
    "ENG", "Neapolitan pizza", neapolitan, obj_addr(neapolitan),
    "ITA", "Pizza pugliese", pugliese, obj_addr(pugliese),
    "ENG", "Apulian pizza", apulian, obj_addr(apulian)
  )
)
Language Name Toppings Recipe code
ITA Pizza alla napoletana mozzarella, tomato , anchovy 0x7fd9967f00f8
ENG Neapolitan pizza mozzarella, tomato , anchovy 0x7fd9967f00f8
ITA Pizza pugliese mozzarella, tomato , onion 0x7fd9973f7c78
ENG Apulian pizza mozzarella, tomato , onion 0x7fd9973f7c78

Pizza alla napoletana and its copy, Neapolitan pizza, point to the same recipe code.

Pizza pugliese was a copy of pizza alla napoletana, but it now points to a different recipe code. Why? An element was changed, anchovies to onions, so a new recipe code was required.

Finally, Apulian pizza is a copy of the pizza pugliese recipe, so they both point to the same unique topping set.

Toppings as lists

Our knowledge management system was, however, a bit inefficient: the mozzarella and tomato toppings existed twice in our recipe book; once for each pizza.

So we decided to update our recipe system to store each topping separately, each with its own special reference code too.

Again, we wrote down the pizza napoletana toppings, copied them, then switched the anchovies for onions. Like in our old system, the two pizzas differ in their third element.

# Toppings now as list elements
napoletana <- list("mozzarella", "tomato", "anchovy")
pugliese <- napoletana          # make a copy
identical(pugliese, napoletana) # they're the same
## [1] TRUE
pugliese[[3]] <- "onion"        # make a change
identical(pugliese, napoletana) # now they're different
## [1] FALSE

So in the new system, each topping has its own unique ingredient code. This means both pizza recipes point to the same ingredient codes for tomato and mozzarella.

# Compare addresses in memory for the lists
# Each 'block' below is a list object (pizza)
# Each element is a character vector (topping)
ref(napoletana, pugliese)
## █ [1:0x7fd999b5ff28] <list> 
## ├─[2:0x7fd99a186ab8] <chr> 
## ├─[3:0x7fd99a186a80] <chr> 
## └─[4:0x7fd99a186a48] <chr> 
##  
## █ [5:0x7fd99a1d00d8] <list> 
## ├─[2:0x7fd99a186ab8] 
## ├─[3:0x7fd99a186a80] 
## └─[6:0x7fd99a186930] <chr>

basically, our pizza names point to pizza recipes that themselves point out to toppings.

Advanced R, p25
“This list is more complex [than a vector] because instead of storing the values itself, it stores references to them.”

This means we can be more efficient in storing our pizza recipes: we write down ‘mozzarella’ and ‘tomatoes’ only once. This could become much more efficient when storing more than the two pizzas we have on La PizzRia’s menu.3

Customer orders as data frames

How do we manage orders? Wait-staff write down each order in a column, with a row for each topping.

Advanced R, p26
“Data frames are lists of vectors.”

Let’s say a couple orders a pizza napoletana and a pizza pugliese.

# Create a data.frame, which is a list of vectors
# Column behaviour is vector behaviour
order <- data.frame(
  napoletana = c("mozzarella", "tomato", "anchovy"),
  pugliese = c("mozzarella", "tomato", "onion")
)

order
##   napoletana   pugliese
## 1 mozzarella mozzarella
## 2     tomato     tomato
## 3    anchovy      onion

As we know, these pizzas both have mozzarella and tomatoes, but the third topping is different.

But wait: the customer who ordered the napoletana is hungry for more anchovies!

order_update <- order  # copy the data.frame object
order_update[3, 1] <- "anchovy (extra)"  # modify the new object
## Warning in `[<-.factor`(`*tmp*`, iseq, value = "anchovy (extra)"): invalid
## factor level, NA generated
order_update
##   napoletana   pugliese
## 1 mozzarella mozzarella
## 2     tomato     tomato
## 3       <NA>      onion

We use a code reference system for our orders too and it works just like our old recipe system.

Since one of the pizza orders was changed, our reference code for the entire order was changed too.

The napoletana was modified after it was copied, so the recipe code for that pizza was updated. The pugliese didn’t change, so its code was maintained.

# Compare the data.frame structures
# Modified column gets new code, object gets new code
# Second column unchanged, code stays the same
ref(order, order_update)
## █ [1:0x7fd9950c0cc8] <df[,2]> 
## ├─napoletana = [2:0x7fd99359ad08] <fct> 
## └─pugliese = [3:0x7fd99359bbc8] <fct> 
##  
## █ [4:0x7fd9975b0708] <df[,2]> 
## ├─napoletana = [5:0x7fd9975af148] <fct> 
## └─pugliese = [3:0x7fd99359bbc8]

Advanced R, p26
“If you modify a column, only that column needs to be modified.”

The mozzarella is especially bountiful this year; the waiter suggests both patrons take advantage.

They strongly agree. The order is copied once more and the waiter modifies the ‘cheese row’ for both pizzas.

order_final <- order_update  # copy the object
order_final[1, 1:2] <- "mozzarella (extra)"  # modify row one of both columns
## Warning in `[<-.factor`(`*tmp*`, iseq, value = "mozzarella (extra)"): invalid
## factor level, NA generated

## Warning in `[<-.factor`(`*tmp*`, iseq, value = "mozzarella (extra)"): invalid
## factor level, NA generated
order_final
##   napoletana pugliese
## 1       <NA>     <NA>
## 2     tomato   tomato
## 3       <NA>    onion

Altering the cheese row means both pizza columns are copied and given new codes. Of course, the order gets a whole new code of its own because the toppings were changed.

# Compare data.frame structures again
# All columns modified, so copies made
# data.frame and column memory locations all differ
ref(order, order_final)
## █ [1:0x7fd9950c0cc8] <df[,2]> 
## ├─napoletana = [2:0x7fd99359ad08] <fct> 
## └─pugliese = [3:0x7fd99359bbc8] <fct> 
##  
## █ [4:0x7fd99917ea08] <df[,2]> 
## ├─napoletana = [5:0x7fd99917e4c8] <fct> 
## └─pugliese = [6:0x7fd99917df08] <fct>

Advanced R, p27
“If you modify a row, every column is modified, which means every column must be copied.”

Buon appetito!

Il conto

So can names and values be explained with this analogy?

Kinda? The basic premise is there: names and pizzas, names and values, etc. But it’s definitely contrived. Why are wait staff writing down pizza orders in a dataframe, etc?

I’ve also deceived you with some ‘polite fiction’, in Hadley’s words. In a numeric vector, the name points to the values. In a character vector, the name actually points to a vector of pointers, which themselves reference unique character strings.

Advanced R, p27
“R actually uses a global string pool where each element of a character vector is a pointer to a unique string in the pool.”

But I don’t think that’s a big deal for getting the point across.

Anyway, your order’s here.

Mangia! Mangia!


Session info
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.3 (2020-02-29)
##  os       macOS  10.16                
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_GB.UTF-8                 
##  ctype    en_GB.UTF-8                 
##  tz       Europe/London               
##  date     2021-02-08                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date       lib source                            
##  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)                    
##  blogdown      0.12    2019-05-01 [1] CRAN (R 3.6.0)                    
##  bookdown      0.10    2019-05-10 [1] CRAN (R 3.6.0)                    
##  cli           2.3.0   2021-01-31 [1] CRAN (R 3.6.2)                    
##  crayon        1.4.0   2021-01-30 [1] CRAN (R 3.6.2)                    
##  digest        0.6.27  2020-10-24 [1] CRAN (R 3.6.2)                    
##  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 3.6.2)                    
##  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.0)                    
##  glue          1.4.2   2020-08-27 [1] CRAN (R 3.6.2)                    
##  highr         0.8     2019-03-20 [1] CRAN (R 3.6.0)                    
##  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.0)                    
##  icon          0.1.0   2019-10-09 [1] Github (ropenscilabs/icon@a5bc1cc)
##  knitr         1.31    2021-01-27 [1] CRAN (R 3.6.2)                    
##  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 3.6.0)                    
##  lobstr      * 1.1.1   2019-07-02 [1] CRAN (R 3.6.0)                    
##  magrittr      2.0.1   2020-11-17 [1] CRAN (R 3.6.2)                    
##  pillar        1.4.7   2020-11-20 [1] CRAN (R 3.6.2)                    
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.0)                    
##  Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.0)                    
##  rlang         0.4.10  2020-12-30 [1] CRAN (R 3.6.2)                    
##  rmarkdown     2.0     2019-12-12 [1] CRAN (R 3.6.0)                    
##  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)                    
##  stringi       1.5.3   2020-09-09 [1] CRAN (R 3.6.2)                    
##  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.0)                    
##  tibble        3.0.6   2021-01-29 [1] CRAN (R 3.6.2)                    
##  vctrs         0.3.6   2020-12-17 [1] CRAN (R 3.6.2)                    
##  withr         2.4.1   2021-01-26 [1] CRAN (R 3.6.2)                    
##  xfun          0.20    2021-01-06 [1] CRAN (R 3.6.2)                    
##  yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.0)                    
## 
## [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

  1. Second edition. You can buy the book, or view it for free online.↩︎

  2. Initially I went with the pop-culture reference about how a Quarter Pounder with Cheese is called a Royale with Cheese in Paris (or indeed, a Krusty Burger with Cheese is called a Quarter Pounder with Cheese at McDonald’s), but the reference was better than the actual utility of the metaphor.↩︎

  3. Of course, if you’re really serious about pizza, you only offer two options. L’antica Pizzeria Da Michele, which is where I took the photos at the top of this post, offers only marinara and margherita. Do the simple things well.↩︎