Stiliyan Petrov: Jesus?


January 8, 2023

Close up of former footballer Stiliyan Petrov playing for Bulgaria. Text around his head says 'Stan Petrov', 'am jebus?', 'nativity' and 'get rekt opta' in Comic Sans font.


In which I prove wrong a tweeted Opta football statistic, using R and Transfermarkt data. Oh wait, actually Opta were right. Ah, heck.

Petrov Rescue

Basically, for little reason, I dislike the style of the tweets on the Twitter feed for Opta1 (the company who do all the football stats).

What is so outrageous? Each tweet always ends in a single, summary word that makes me cringe.

Wait, what? Let’s take a look at their most recent tweet at time of writing:

14 - Harry Kane has scored 14 goals in his last 14 appearances in the FA Cup, averaging a goal every 63 minutes in the competition in this period. Guarantee.

‘Guarantee’. Gah.

Or this tweet:

16 - Since his first appearance in the competition in January 2016, Leicester’s Kelechi Iheanacho has scored more FA Cup goals than any other player (16). Specialist.

‘Specialist’. Sigh.

A completely small and pointless thing to be annoyed by, right?

But here’s the scenario. Over the yuletide period (on Christmas day!) they ran this tweet:

1 - Stiliyan Petrov (@StanPetrov19) is the only player to have played in the Premier League whose name contains all the letters in the word ‘Nativity’. Star.

Obviously, I have absolutely nothing against ‘Big Stan’. He’s a legend; a ‘star’, if you will. Captain of Aston Villa! Bulgaria! Battled leukaemia and still made it to nearly 600 games. One of the best Bulgarian/Premier League ‘Petrovs’, along with cult legend Martin.

But could this stat possibly be true? Surely there’s at least one other player. Perhaps a window of opportunity for me to avenge my feelings of cringe?

Oh, and obviously you can ignore the candid dismissals in the tweet’s replies, for example:

What are we supposed to do with this information? [Picture of wryly-smiling duck.]

No, this is more important than any Opta tweet ever: what if it’s… wrong?

Stan in R, but not {rstan}

So I looked into it using R, of course.

Turns out it’s pretty straightforward with the excellent {worldfootballR} package by Jason Zivkovic, which helps fetch player data from Transfermarkt (among other suppliers).

Basically, we can fetch data about footballers from every team in a given league’s season since its inception. So, aha, you cannot escape, Opta!

My little {soccercolleagues} package that I wrote about in early 2022 is built heavily (heavily!) around {worldfootballR} and has a convenience function we can use.

The niche2 primary objective of {soccercolleagues} is to let you find pairs of football players that were colleagues at some point. Like: ‘which current Premier League footballer has been team mates with each of the following: Kevin Phillips, Mark Viduka, Dejan Lovren, Danny Ings and Nicky Butt?’3

Follow along. As ever, you can install the {soccercolleagues} package from GitHub:

if(!require(remotes)) install.packages("remotes")

We’ll also use the {tidyverse} for wrangling.


So we can ask Transfermarkt for all the years of the English Premier League, which began in 1992:

# This will take quite a long time...
epl_players <- soccercolleagues::get_players(
  seasons = 1992:2022,
  country = "England"

And now we can look for the players whose names contain the letters in ‘nativity’:

epl_players |>
  distinct(player_name) |>
    player_name = str_remove_all(tolower(player_name), " "),
    n_count = str_count(player_name, "n"),
    a_count = str_count(player_name, "a"),
    t_count = str_count(player_name, "t"),
    i_count = str_count(player_name, "i"),
    v_count = str_count(player_name, "v"),
    y_count = str_count(player_name, "y")
  ) |>
    n_count >= 1 &
      a_count >= 1 &
      t_count >= 2 &
      i_count >= 2 &
      v_count >= 1 &
      y_count >= 1
# A tibble: 1 × 7
  player_name    n_count a_count t_count i_count v_count y_count
  <chr>            <int>   <int>   <int>   <int>   <int>   <int>
1 stiliyanpetrov       1       1       2       2       1       1

Oof… they were right. He is the only one.

Wow, this humble pie is so delicious, thank you so much Opta for unintentionally spoonfeeding it to me.

To be clear: Opta’s data analysts have a good track record, as far as I know. But I’ve got my eye on you! You’ll slip up one day!

…But wait. Opta were misnaming Stan as ‘Stylian Petrov’ in tweets as late as 2012. Get rekt! You missed the extra ‘i’ you need in ‘nativity’, fools! Put respect on Stiliyan’s name!



Session info
Last rendered: 2023-07-06 19:27:34 BST
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.2 compiler_4.3.1    fastmap_1.1.1     cli_3.6.1        
 [5] tools_4.3.1       htmltools_0.5.5   rstudioapi_0.14   yaml_2.3.7       
 [9] rmarkdown_2.23    knitr_1.43.1      jsonlite_1.8.7    xfun_0.39        
[13] digest_0.6.31     rlang_1.1.1       evaluate_0.21    


  1. This post is not guerilla marketing for Opta. It would be extremely guerilla if they wanted to advertise on this blog.↩︎

  2. There is definitely a burgeoning crossover of football stats and R users, see Ryo, Ben and Tony, for example.↩︎

  3. Hint: it’s a very ‘boring’ footballer, lol.↩︎

  4. By which I mean I lost 1-0.↩︎