R Trek: exploring stardates

dataviz
plotly
r
rvest
text
Author
Published

April 14, 2018

Captain Picard points and gives the order to 'make it so'.

Captain’s log

Star date 71750.51. Our mission is to use R statistical software to extract star dates mentioned in the captain’s log from the scripts of Star Trek: The Next Generation and observe their progression over the course of the show’s seven seasons. There appears to be some mismatch in the frequency of digits after the decimal point – could this indicate poor ability to choose random numbers? Or something more sinister? We shall venture deep into uncharted territory for answers…

We’re going to:

  • iterate reading in text files – containing ‘Star Trek: The Next Generation’ (ST:TNG) scripts – to R and then extract stardates using the {purrr} and {stringr} packages
  • web scrape episode names using the {rvest} package and join them to the stardates data
  • tabulate and plot these interactively with {ggplot2}, {plotly} and {DT}

Also, very minor spoiler alert for a couple of ST:TNG episodes.

Lieutenant Commander Data

I’m using the the Star Trek Minutiae website to access all the ST:TNG scripts as text files. You can download the scripts as zipped folder with 176 text files.

Each episode has a dedicated URL where we can read the script from with readLines(). We can loop over each episode to get a list element per script. This will take a few moments to run.

# Build URL paths to each script
base_url <- "https://www.st-minutiae.com/resources/scripts/"
ep_numbers <- 102:277  # ep 1 & 2 combined, so starts at 102
ep_paths <- paste0(base_url, ep_numbers, ".txt")

# Preallocate a list to fill with each script
scripts <- vector("list", length(ep_numbers))
names(scripts) <- ep_numbers

# For each URL path, read the script and add to the list
for (ep in seq_along(ep_paths)) {
  txt <- readLines(ep_paths[ep], skipNul = TRUE)
  ep_num <- tools::file_path_sans_ext(basename(ep_paths[ep]))
  scripts[[ep_num]] <- txt
}

We can take a look at some example lines from the title page of the first script.

scripts[["102"]][17:24]
[1] "                STAR TREK: THE NEXT GENERATION "
[2] "                              "                 
[3] "                    \"Encounter at Farpoint\" " 
[4] "                              "                 
[5] "                              by "              
[6] "                         D.C. Fontana "         
[7] "                              and "             
[8] "                       Gene Roddenberry "       

Our first example of a star date is in the Captain’s log voiceover in line 47 of the first script. (The \t denotes tab space.)

scripts[["102"]][46:47]
[1] "\t\t\t\t\tPICARD V.O."                 
[2] "\t\t\tCaptain's log, stardate 42353.7."

Engage!

We want to extract stardate strings from each script in our list. As you can see from Picard’s voiceover above, these are given in the form ‘XXXXX.X’, where each X is a digit.

We can extract these with str_extract_all() from the {stringr} package, using a regular expression (regex).

Our regex is written date[:space:][[:digit:]\\.[:digit:]]{7}. This means:

  • find a string that starts with the word date and is followed by a space (i.e. date)
  • which is followed by a string that contains digits ([:digit:]) with a period (\\.) inside
  • with a total length of seven characters ({7})’

This creates a list object with an element for each script that contains all the regex-matched strings.

library(stringr)

# Collapse each script to a single element
scripts_collapsed <- lapply(scripts, paste, collapse = " ")

# Declare the regex
stardate_regex <- "date[:space:][[:digit:]\\.[:digit:]]{7}"

# For each script, extract all the stardates
stardate_extract <- lapply(
  scripts_collapsed, 
  function(script) str_extract_all(script, stardate_regex)[[1]]
)

stardate_extract[1:3]
$`102`
[1] "date 42353.7" "date 42354.1" "date 42354.2" "date 42354.7" "date 42372.5"

$`103`
[1] "date 41209.2" "date 41209.3"

$`104`
[1] "date 41235.2" "date 41235.3"

We’re now going to make the data into a tidy dataframe and clean it up so it’s easier to work with. We can use some tidyverse packages for this.

library(dplyr, warn.conflicts = FALSE)
library(tibble)
library(tidyr)

stardate_tidy <- stardate_extract %>% 
  enframe() %>%  # list to dataframe (one row per episode)
  unnest(cols = value) %>%  # dataframe with one row per stardate
  transmute(  # create columns and retain only these
    episode = as.numeric(name),
    stardate = str_replace(value, "date ", "")
  ) %>%
  mutate(
    stardate = str_replace(stardate, "\\.\\.$", ""),
    stardate = as.numeric(stardate)
  )

head(stardate_tidy)
# A tibble: 6 × 2
  episode stardate
    <dbl>    <dbl>
1     102   42354.
2     102   42354.
3     102   42354.
4     102   42355.
5     102   42372.
6     103   41209.

Now we can add a couple more columns for convenience: each episode’s season number and the number after the decimal point in each stardate.

stardate_tidy_plus <- stardate_tidy %>% 
  mutate(
    season = case_when(
      episode %in% 102:126 ~ 1,
      episode %in% 127:148 ~ 2,
      episode %in% 149:174 ~ 3,
      episode %in% 175:200 ~ 4,
      episode %in% 201:226 ~ 5,
      episode %in% 227:252 ~ 6,
      episode %in% 253:277 ~ 7
    ),
    stardate_decimal = str_sub(stardate, 7, 7)  # 7th character is the decimal
  )

head(stardate_tidy_plus)
# A tibble: 6 × 4
  episode stardate season stardate_decimal
    <dbl>    <dbl>  <dbl> <chr>           
1     102   42354.      1 7               
2     102   42354.      1 1               
3     102   42354.      1 2               
4     102   42355.      1 7               
5     102   42372.      1 5               
6     103   41209.      1 2               

Prepare a scanner probe

We could extract episode names from the scripts, but another option is to scrape them from the ST:TNG episode guide on Wikipedia.

If you visit that link, you’ll notice that the tables of episodes actually give a stardate, but they only provide one per episode – our script-scraping shows that many episodes have multiple instances of stardates.

We can use the {rvest} package by Hadley Wickham to perform the scrape. This works by supplying a website address and the path of the thing we want to extract – the episode name column of tables on the Wikipedia page. I used SelectorGadget – a point-and-click tool for finding the CSS selectors for elements of webpages – for this column in each of the tables on the Wikipedia page (.wikiepisodetable tr > :nth-child(3)). A short how-to vignette is available for {rvest} + SelectorGadget.

library(rvest)

# store website address
tng_ep_wiki <- read_html(
  "https://en.wikipedia.org/wiki/List_of_Star_Trek:_The_Next_Generation_episodes"
)

# extract and tidy
tng_ep_names <- tng_ep_wiki %>%  # website address
  html_nodes(".wikiepisodetable tr > :nth-child(3)") %>%  # via SelectorGadget
  html_text() %>%  # extract text
  tibble() %>%  # to dataframe
  rename(episode_title = ".") %>%  # sensible column name
  filter(episode_title != "Title") %>%  # remove table headers
  mutate(episode = row_number() + 101)  # episode number (join key)

head(tng_ep_names)
# A tibble: 6 × 2
  episode_title                      episode
  <chr>                                <dbl>
1 "\"Encounter at Farpoint\""            102
2 "\"The Naked Now\""                    103
3 "\"Code of Honor\""                    104
4 "\"The Last Outpost\""                 105
5 "\"Where No One Has Gone Before\""     106
6 "\"Lonely Among Us\""                  107

So now we can join the episode names to the dataframe generated from the scripts. This gives us a table with a row per stardate extracted, with its associated season, episode number and episode name.

stardate_tidy_names <- stardate_tidy_plus %>%
  left_join(tng_ep_names, by = "episode") %>% 
  select(season, episode, episode_title, stardate, stardate_decimal)

We can make these data into an interactive table with the DT::datatable() htmlwidget.

library(DT)

datatable(
  stardate_tidy_names,
  rownames = FALSE,
  options = list(pageLength = 5, autoWidth = TRUE)
)

So that’s a searchable list of all the stardates in each episode.

On screen

Let’s visualise the stardates by episode.

We can make this interactive using the {plotly} package – another htmlwidget for R – that conveniently has the function ggplotly() that can turn a ggplot object into an interactive plot. You can hover over each point to find out more information about it.

Obviously there’s a package ({ggsci}) that contains a discrete colour scale based on the shirts of the Enterprise crew. Obviously we’ll use that here.

library(ggplot2)  # basic plotting
library(plotly, warn.conflicts = FALSE)  # make plot interactive
library(ggsci)  # star trek colour scale
library(ggthemes)  # dark plot theme

# create basic plot
stardate_dotplot <- stardate_tidy_names %>% 
  mutate(season = as.character(season)) %>%
  ggplot() +
  geom_point(  # dotplot
    aes(
      x = episode - 100,
      y = stardate,
      color = season,  # each season gets own colour
      group = episode_title
    )
  ) +
  labs(title = "Stardates are almost (but not quite) chronological") +
  theme_solarized_2(light = FALSE) +  # dark background
  theme(legend.position = "none") +
  scale_color_startrek()  # Star Trek uniform colours

We can make this interactive with {plotly} You can hover over the points to see details in a tooltip and use the Plotly tools that appear on hover in the top-right to zoom, download, etc.

# make plot interactive
stardate_dotplot %>% 
  ggplotly() %>% 
  layout(margin = list(l = 75))  # adjust margin to fit y-axis label

So there were some non-chronological stardates between episodes of the first and second series and at the beginning of the third, but the stardate-episode relationship became more linear after that.

Three points seem to be anomalous with stardates well before the present time period of the episode. Without spoiling them (too much), we can see that each of these episodes takes place in, or references, the past.

‘Identity Crisis’ (season 4, episode 91, stardate 40164.7) takes place partly in the past:

scripts[[91]][127:129]
[1] "\tGEORDI moves into view, holding a Tricorder. (Note:"  
[2] "\tGeordi is younger here, wearing a slightly different,"
[3] "\tearlier version of his VISOR.)"                       

‘Dark Page’ (season 7, episode 158, stardate 30620.1) has a scene involving a diary:

scripts[[158]][c(2221:2224, 2233:2235)]
[1] "\t\t\t\t\tTROI"                         
[2] "\t\t\tThere's a lot to review. My"      
[3] "\t\t\tmother's kept a journal since she"
[4] "\t\t\twas first married..."             
[5] "\t\t\t\t\tPICARD"                       
[6] "\t\t\tThe first entry seems to be"      
[7] "\t\t\tStardate 30620.1."                

‘All Good Things’ (season 7, episode 176, stardate 41153.7) involves some time travel for Captain Picard:

scripts[[176]][1561:1569]
[1] "\t\t\t\t\tPICARD (V.O.)"                 
[2] "\t\t\tPersonal Log: Stardate 41153.7."   
[3] "\t\t\tRecorded under security lockout"   
[4] "\t\t\tOmega three-two-seven. I have"     
[5] "\t\t\tdecided not to inform this crew of"
[6] "\t\t\tmy experiences. If it's true that" 
[7] "\t\t\tI've travelled to the past, I"     
[8] "\t\t\tcannot risk giving them"           
[9] "\t\t\tforeknowledge of what's to come."  

Speculate

So stardates are more or less chronological across the duration of ST:TNG’s seven series, implying that the writers had a system in place. A few wobbles in consistency appear during the first few season suggest that it took some time to get this right. None of this is new information (see the links in the ‘Open Channel!’ section below).

It seems the vast majority of episodes take place in the programme’s present with a few exceptions. We may have missed some forays through time simply because the stardate was unknown or unmentioned.

Open channel

Only too late did I realise that there is an RTrek GitHub organisation with a Star Trek package, TNG datasets and some other functions.

A selection of further reading:

  • Memory Alpha is a collaborative project to create the most definitive, accurate, and accessible encyclopedia and reference for everything related to Star Trek’, including stardates
  • ‘The STArchive is home to the… Ships and Locations lists… [and] a few other technical FAQs’, including a deep-dive into the theories in a Stardates in Star Trek FAQ
  • Trekguide’s take on the messiness of stardates also includes a stardate converter
  • There’s a handy universal stardate converter at Redirected Insanity
  • The scripts were downloaded from Star Trek Minutiae, a site that has ‘obscure references and little-known facts’ and ‘explore[s] and expand[s] the wondrous multiverse of Star Trek’
  • A simpler guide to stardates can be found on Mentalfloss
  • You can find the full list of The Next Generation episodes on Wikipedia

Full stop!

Captain Picard gives the order for Ensign Crusher to 'shut up, Wesley'.

Environment

Session info
Last rendered: 2023-08-09 23:26:14 BST
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggthemes_4.2.4 ggsci_3.0.0    plotly_4.10.2  ggplot2_3.4.2  DT_0.28       
 [6] rvest_1.0.3    tidyr_1.3.0    tibble_3.2.1   dplyr_1.1.2    stringr_1.5.0 

loaded via a namespace (and not attached):
 [1] sass_0.4.7        utf8_1.2.3        generics_0.1.3    xml2_1.3.5       
 [5] stringi_1.7.12    digest_0.6.33     magrittr_2.0.3    evaluate_0.21    
 [9] grid_4.3.1        fastmap_1.1.1     jsonlite_1.8.7    httr_1.4.6       
[13] purrr_1.0.1       fansi_1.0.4       selectr_0.4-2     viridisLite_0.4.2
[17] crosstalk_1.2.0   scales_1.2.1      lazyeval_0.2.2    jquerylib_0.1.4  
[21] cli_3.6.1         rlang_1.1.1       ellipsis_0.3.2    munsell_0.5.0    
[25] withr_2.5.0       cachem_1.0.8      yaml_2.3.7        tools_4.3.1      
[29] colorspace_2.1-0  curl_5.0.1        vctrs_0.6.3       R6_2.5.1         
[33] lifecycle_1.0.3   htmlwidgets_1.6.2 pkgconfig_2.0.3   pillar_1.9.0     
[37] bslib_0.5.0       gtable_0.3.3      glue_1.6.2        data.table_1.14.8
[41] xfun_0.39         tidyselect_1.2.0  rstudioapi_0.15.0 knitr_1.43.1     
[45] htmltools_0.5.5   labeling_0.4.2    rmarkdown_2.23    compiler_4.3.1   

Footnotes

  1. The star date for today’s date (14 April 2018) as calculated using the trekguide.com method; this ‘would be the stardate of this week’s episode if The Next Generation and its spinoffs were still in production’.↩︎

Reuse

CC BY-NC-SA 4.0