2020 La Liga Table
Yesterday was the final matchday of the 2019/20 La Liga season. Real Madrid claimed their 34th La Liga title, Atleti qualify for the UCL for the 8th consecutive season, and Espanyol are back to la Segunda after 27 seasons in Spain’s top-flight.
Let’s sum up this season with some data cleaning.
There are plenty of quality resources online demonstrating how to clean match fixtures to create a league tables for completed soccer seasons:
- James Curleys: Spanish La Liga Dataset
- Jona’s (aka Opisthokonta) blog: R functions for soccer league tables and result matrix
While these are great examples of how to create a generic league table, La Liga has unique tie-break rules for teams that end the season level on points. The examples that I shared above resort to manually editing teams’ final league positions. I wanted to come up with a systematic approach using the specific tie-breaking rules for La Liga to create a final league table for any La Liga season.
For this analysis I accessed FiveThirtyEight’s SPI matches data.
# Load Packages
library(tidyverse)
# Read Data
spi_raw <- read_csv("https://projects.fivethirtyeight.com/soccer-api/club/spi_matches.csv")
glimpse(spi_raw)
## Rows: 34,284
## Columns: 23
## $ season <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 20…
## $ date <date> 2016-08-12, 2016-08-12, 2016-08-13, 2016-08-13, 2016-08…
## $ league_id <dbl> 1843, 1843, 2411, 2411, 2411, 2411, 2411, 2411, 1843, 24…
## $ league <chr> "French Ligue 1", "French Ligue 1", "Barclays Premier Le…
## $ team1 <chr> "Bastia", "AS Monaco", "Hull City", "Crystal Palace", "E…
## $ team2 <chr> "Paris Saint-Germain", "Guingamp", "Leicester City", "We…
## $ spi1 <dbl> 51.16, 68.85, 53.57, 55.19, 68.02, 69.49, 56.32, 58.98, …
## $ spi2 <dbl> 85.68, 56.48, 66.81, 58.66, 73.25, 59.33, 60.35, 59.74, …
## $ prob1 <dbl> 0.0463, 0.5714, 0.3459, 0.4214, 0.3910, 0.5759, 0.4380, …
## $ prob2 <dbl> 0.8380, 0.1669, 0.3621, 0.2939, 0.3401, 0.1874, 0.2692, …
## $ probtie <dbl> 0.1157, 0.2617, 0.2921, 0.2847, 0.2689, 0.2367, 0.2927, …
## $ proj_score1 <dbl> 0.91, 1.82, 1.16, 1.35, 1.47, 1.91, 1.30, 1.37, 1.39, 2.…
## $ proj_score2 <dbl> 2.36, 0.86, 1.24, 1.14, 1.38, 1.05, 1.01, 1.05, 1.14, 0.…
## $ importance1 <dbl> 32.4, 53.7, 38.1, 43.6, 31.9, 34.1, 33.9, 36.5, 37.9, 73…
## $ importance2 <dbl> 67.7, 22.9, 22.2, 34.6, 48.0, 30.7, 32.5, 29.1, 44.2, 27…
## $ score1 <dbl> 0, 2, 2, 0, 1, 1, 1, 0, 3, 2, 1, 3, 0, 3, 1, 0, 3, 1, 0,…
## $ score2 <dbl> 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 0, 2, 1, 2, 3, 3, 4, 0, 0,…
## $ xg1 <dbl> 0.97, 2.45, 0.85, 1.11, 0.73, 1.05, 1.40, 1.24, 1.03, 2.…
## $ xg2 <dbl> 0.63, 0.77, 2.77, 0.68, 1.11, 0.22, 0.55, 1.84, 1.84, 1.…
## $ nsxg1 <dbl> 0.43, 1.75, 0.17, 0.84, 0.88, 1.52, 1.13, 1.71, 1.10, 1.…
## $ nsxg2 <dbl> 0.45, 0.42, 1.25, 1.60, 1.81, 0.41, 1.06, 1.56, 2.26, 0.…
## $ adj_score1 <dbl> 0.00, 2.10, 2.10, 0.00, 1.05, 1.05, 1.05, 0.00, 3.12, 2.…
## $ adj_score2 <dbl> 1.05, 2.10, 1.05, 1.05, 1.05, 1.05, 1.05, 1.05, 2.10, 1.…
This data is fantastic. Each observation includes a match date, league id’s, team names, team SPIs, teams’ likelihood of winning, as well as xG/NSxG for each team. In order to make this data more useful, I’ve used the following cleaning steps to provide a “tidy” table.
matches <- spi_raw %>%
transmute(date,
league,
league_id,
team = team1,
opponent = team2,
teamGoal = score1,
oppGoal = score2,
result = case_when(score1 > score2 ~ "W",
score1 < score2 ~ "L",
score1 == score2 ~ "D"),
ha = "Home") %>%
bind_rows(
spi_raw %>%
transmute(date,
league,
league_id,
team = team2,
opponent = team1,
teamGoal = score2,
oppGoal = score1,
result = case_when(score1 < score2 ~ "W",
score1 > score2 ~ "L",
score1 == score2 ~ "D"),
ha = "Away")) %>%
mutate(game_goal_diff = teamGoal - oppGoal) %>%
mutate(result_points = case_when(result == "W" ~ 3,
result == "D" ~ 1,
TRUE ~ 0))
glimpse(matches)
## Rows: 68,568
## Columns: 11
## $ date <date> 2016-08-12, 2016-08-12, 2016-08-13, 2016-08-13, 2016…
## $ league <chr> "French Ligue 1", "French Ligue 1", "Barclays Premier…
## $ league_id <dbl> 1843, 1843, 2411, 2411, 2411, 2411, 2411, 2411, 1843,…
## $ team <chr> "Bastia", "AS Monaco", "Hull City", "Crystal Palace",…
## $ opponent <chr> "Paris Saint-Germain", "Guingamp", "Leicester City", …
## $ teamGoal <dbl> 0, 2, 2, 0, 1, 1, 1, 0, 3, 2, 1, 3, 0, 3, 1, 0, 3, 1,…
## $ oppGoal <dbl> 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 0, 2, 1, 2, 3, 3, 4, 0,…
## $ result <chr> "L", "D", "W", "L", "D", "D", "D", "L", "W", "W", "W"…
## $ ha <chr> "Home", "Home", "Home", "Home", "Home", "Home", "Home…
## $ game_goal_diff <dbl> -1, 0, 1, -1, 0, 0, 0, -1, 1, 1, 1, 1, -1, 1, -2, -3,…
## $ result_points <dbl> 0, 1, 3, 0, 1, 1, 1, 0, 3, 3, 3, 3, 0, 3, 0, 0, 0, 3,…
We now have a two observations per team, per match. We now can think observation from the perspective of a team, now with details about whether the match was played home or away (ha
), the team’s opponent, goals for and against, and the result of the match. I also included a field for team’s goal differential for the game (game_goal_diff
).
One thing that 538 doesn’t include is a variable for season, so we’ll have to filter to include only matches played for this season and for La Liga (which has the league id 1869).
## Filter to only include this La Liga season
liga <- matches %>%
filter(league_id == 1869, between(date, as.Date("2019-08-15"), as.Date("2020-07-20")))
Let’s now create a traditional league table for the end of the season:
table_1 <- liga %>%
group_by(team) %>%
summarise(MP = n(),
W = sum(result == "W"),
D = sum(result == "D"),
L = sum(result == "L"),
GF = sum(teamGoal),
GA = sum(oppGoal),
GD = GF - GA,
Pts = sum(result_points)) %>%
arrange(desc(Pts))
table_1
## # A tibble: 20 x 9
## team MP W D L GF GA GD Pts
## <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Real Madrid 38 26 9 3 70 25 45 87
## 2 Barcelona 38 25 7 6 86 38 48 82
## 3 Atletico Madrid 38 18 16 4 51 27 24 70
## 4 Sevilla FC 38 19 13 6 54 34 20 70
## 5 Villarreal 38 18 6 14 63 49 14 60
## 6 Granada 38 16 8 14 52 45 7 56
## 7 Real Sociedad 38 16 8 14 56 48 8 56
## 8 Getafe 38 14 12 12 43 37 6 54
## 9 Valencia 38 14 11 13 46 53 -7 53
## 10 Osasuna 38 13 13 12 46 54 -8 52
## 11 Athletic Bilbao 38 13 12 13 41 38 3 51
## 12 Levante 38 14 7 17 47 53 -6 49
## 13 Eibar 38 11 9 18 39 56 -17 42
## 14 Real Valladolid 38 9 15 14 32 43 -11 42
## 15 Real Betis 38 10 11 17 48 60 -12 41
## 16 Alavés 38 10 9 19 34 59 -25 39
## 17 Celta Vigo 38 7 16 15 37 49 -12 37
## 18 Leganes 38 8 12 18 30 51 -21 36
## 19 Mallorca 38 9 6 23 40 65 -25 33
## 20 Espanyol 38 5 10 23 27 58 -31 25
Looking good! But as I noted before, the tie-breaking rules for La Liga are unique. In other leagues around the world (e.g. the English Premier League), goal difference is the primary tie-breaker, however in Spain the first tie breaker is head-to-head results for teams with the same number of points. For example, if Valencia and Levante were to level on points but Valencia managed to beat Levante in both fixtures, Valencia would beat Levante head-to-head. Here’s the full breakdown for league classification:
Rules for classification: 1) Points; 2) Head-to-head points; 3) Head-to-head goal difference; 4) Goal difference; 5) Goals scored; 6) Fair-play points (Note: Head-to-head record is used only after all the matches between the teams in question have been played)
This season there are a few teams that ended with the same number of points:
- 60pts: Atleti and Sevilla
- 56pts: Granda and La Real
- 42pts: Eibar and Valladolid
I’ll deal with these tie-breaks by considering teams with the same number of points as a group, and creating a mini league table for each group, providing teams with a rank within their group, and joining the mini league table back to the main league table.
# Table with all teams level on points, with an id unique to teams
# with the same final point total
team_ties <- table_1 %>%
group_by(Pts) %>%
filter(n() > 1) %>%
mutate(id = cur_group_id()) %>%
ungroup() %>%
select(team, id)
team_ties
## # A tibble: 6 x 2
## team id
## <chr> <int>
## 1 Atletico Madrid 3
## 2 Sevilla FC 3
## 3 Granada 2
## 4 Real Sociedad 2
## 5 Eibar 1
## 6 Real Valladolid 1
In these steps, we’ll filter for only matches where teams within the same points-group are playing each other. In other words we’re looking for the two matches Atleti and Sevilla played each other, the two matches between Granda and La Real, and the two matches between Eibar and Valladolid. After we have matches, we’ll summarise points and goal differential by the points-group.
tie_break_table <- team_ties %>%
rename(team_id = id) %>%
inner_join(liga, by = "team") %>%
inner_join(team_ties %>% rename(opponent = team, opponent_id = id), by = "opponent") %>%
filter(team_id == opponent_id) %>%
mutate(game_goal_diff = teamGoal - oppGoal) %>%
group_by(team_id, team) %>%
summarise(tie_break_pts = sum(result_points),
tie_break_gd = sum(game_goal_diff),
.groups = "drop") %>%
arrange(team_id, tie_break_pts, tie_break_gd) %>%
select(team, tie_break_pts, tie_break_gd)
Finally, we join the tie-break table to the final table, arrange the teams by the sort criteria, and remove unnecessary columns.
final_table <- table_1 %>%
left_join(tie_break_table, by = "team") %>%
arrange(desc(Pts), desc(tie_break_pts), desc(tie_break_gd), desc(GD), desc(GF)) %>%
mutate(Rank = row_number()) %>%
select(Rank, everything(), -tie_break_pts, -tie_break_gd)
final_table
## # A tibble: 20 x 10
## Rank team MP W D L GF GA GD Pts
## <int> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 Real Madrid 38 26 9 3 70 25 45 87
## 2 2 Barcelona 38 25 7 6 86 38 48 82
## 3 3 Atletico Madrid 38 18 16 4 51 27 24 70
## 4 4 Sevilla FC 38 19 13 6 54 34 20 70
## 5 5 Villarreal 38 18 6 14 63 49 14 60
## 6 6 Real Sociedad 38 16 8 14 56 48 8 56
## 7 7 Granada 38 16 8 14 52 45 7 56
## 8 8 Getafe 38 14 12 12 43 37 6 54
## 9 9 Valencia 38 14 11 13 46 53 -7 53
## 10 10 Osasuna 38 13 13 12 46 54 -8 52
## 11 11 Athletic Bilbao 38 13 12 13 41 38 3 51
## 12 12 Levante 38 14 7 17 47 53 -6 49
## 13 13 Real Valladolid 38 9 15 14 32 43 -11 42
## 14 14 Eibar 38 11 9 18 39 56 -17 42
## 15 15 Real Betis 38 10 11 17 48 60 -12 41
## 16 16 Alavés 38 10 9 19 34 59 -25 39
## 17 17 Celta Vigo 38 7 16 15 37 49 -12 37
## 18 18 Leganes 38 8 12 18 30 51 -21 36
## 19 19 Mallorca 38 9 6 23 40 65 -25 33
## 20 20 Espanyol 38 5 10 23 27 58 -31 25
Venga vamos! Tenemos la final tabla de la temporada! We have a final league table for the season. Now what’s the use doing this once? Let’s create a function that matches for a La Liga season and produces the end of season league table.
la_liga_table <- function(matches){
table_1 = matches %>%
group_by(team) %>%
summarise(MP = n(),
W = sum(result == "W"),
D = sum(result == "D"),
L = sum(result == "L"),
GF = sum(teamGoal),
GA = sum(oppGoal),
GD = GF - GA,
Pts = sum(result_points)) %>%
arrange(desc(Pts))
team_ties = table_1 %>%
group_by(Pts) %>%
filter(n() > 1) %>%
mutate(id = cur_group_id()) %>%
ungroup() %>%
select(team, id)
tie_break_table = team_ties %>%
rename(team_id = id) %>%
inner_join(matches, by = "team") %>%
inner_join(team_ties %>% rename(opponent = team, opponent_id = id), by = "opponent") %>%
filter(team_id == opponent_id) %>%
mutate(game_goal_diff = teamGoal - oppGoal) %>%
group_by(team_id, team) %>%
summarise(tie_break_pts = sum(result_points),
tie_break_gd = sum(game_goal_diff),
.groups = "drop") %>%
arrange(team_id, tie_break_pts, tie_break_gd) %>%
select(team, tie_break_pts, tie_break_gd)
final_table = table_1 %>%
left_join(tie_break_table, by = "team") %>%
arrange(desc(Pts),
desc(tie_break_pts),
desc(tie_break_gd),
desc(GD),
desc(GF)) %>%
mutate(Rank = row_number()) %>%
select(Rank, everything(), -tie_break_pts, -tie_break_gd)
return(final_table)
}
Let’s test it for the 2017/18 La Liga season:
## Filter to only include the 2017-18 La Liga Season
liga_2017 <- matches %>%
filter(league_id == "1869", date >= as.Date("2017-07-31") & date < as.Date("2018-07-31"))
cat("2017/18 La Liga Table")
## 2017/18 La Liga Table
la_liga_table(liga_2017)
## # A tibble: 20 x 10
## Rank team MP W D L GF GA GD Pts
## <int> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 Barcelona 38 28 9 1 99 29 70 93
## 2 2 Atletico Madrid 38 23 10 5 58 22 36 79
## 3 3 Real Madrid 38 22 10 6 94 44 50 76
## 4 4 Valencia 38 22 7 9 65 38 27 73
## 5 5 Villarreal 38 18 7 13 57 50 7 61
## 6 6 Real Betis 38 18 6 14 60 61 -1 60
## 7 7 Sevilla FC 38 17 7 14 49 58 -9 58
## 8 8 Getafe 38 15 10 13 42 33 9 55
## 9 9 Eibar 38 14 9 15 44 50 -6 51
## 10 10 Girona FC 38 14 9 15 50 59 -9 51
## 11 11 Espanyol 38 12 13 13 36 42 -6 49
## 12 12 Real Sociedad 38 14 7 17 66 59 7 49
## 13 13 Celta Vigo 38 13 10 15 59 60 -1 49
## 14 14 Alavés 38 15 2 21 40 50 -10 47
## 15 15 Levante 38 11 13 14 44 58 -14 46
## 16 16 Athletic Bilbao 38 10 13 15 41 49 -8 43
## 17 17 Leganes 38 12 7 19 34 51 -17 43
## 18 18 Deportivo La Coruña 38 6 11 21 38 76 -38 29
## 19 19 Las Palmas 38 5 7 26 24 74 -50 22
## 20 20 Málaga 38 5 5 28 24 61 -37 20
Ahh, there was a unique case this year where three teams finished with 49 points; and by checking the final league table our function handled these ties correctly. I’m already looking forward to the next La Liga season.
I didn’t address the final tie-breaker: fair-play points. This would take a little more time to incorporate seeing as 538 doesn’t include data on discipline (yellow/red cards, fouls, etc).