Yesterday was the final matchday of the 2019/20 La Liga season. Real Madrid claimed their 34th La Liga title, Atleti qualify for the UCL for the 8th consecutive season, and Espanyol are back to la Segunda after 27 seasons in Spain’s top-flight.
Let’s sum up this season with some data cleaning.
There are plenty of quality resources online demonstrating how to clean match fixtures to create a league tables for completed soccer seasons:
While these are great examples of how to create a generic league table, La Liga has unique tie-break rules for teams that end the season level on points. The examples that I shared above resort to manually editing teams’ final league positions. I wanted to come up with a systematic approach using the specific tie-breaking rules for La Liga to create a final league table for any La Liga season.
For this analysis I accessed FiveThirtyEight’s SPI matches data.
This data is fantastic. Each observation includes a match date, league id’s, team names, team SPIs, teams’ likelihood of winning, as well as xG/NSxG for each team. In order to make this data more useful, I’ve used the following cleaning steps to provide a “tidy” table.
We now have a two observations per team, per match. We now can think observation from the perspective of a team, now with details about whether the match was played home or away (ha), the team’s opponent, goals for and against, and the result of the match. I also included a field for team’s goal differential for the game (game_goal_diff).
One thing that 538 doesn’t include is a variable for season, so we’ll have to filter to include only matches played for this season and for La Liga (which has the league id 1869).
Let’s now create a traditional league table for the end of the season:
Looking good! But as I noted before, the tie-breaking rules for La Liga are unique. In other leagues around the world (e.g. the English Premier League), goal difference is the primary tie-breaker, however in Spain the first tie breaker is head-to-head results for teams with the same number of points. For example, if Valencia and Levante were to level on points but Valencia managed to beat Levante in both fixtures, Valencia would beat Levante head-to-head. Here’s the full breakdown for league classification:
Rules for classification: 1) Points; 2) Head-to-head points; 3) Head-to-head goal difference; 4) Goal difference; 5) Goals scored; 6) Fair-play points (Note: Head-to-head record is used only after all the matches between the teams in question have been played)^
This season there are a few teams that ended with the same number of points:
60pts: Atleti and Sevilla
56pts: Granda and La Real
42pts: Eibar and Valladolid
I’ll deal with these tie-breaks by considering teams with the same number of points as a group, and creating a mini league table for each group, providing teams with a rank within their group, and joining the mini league table back to the main league table.
In these steps, we’ll filter for only matches where teams within the same points-group are playing each other. In other words we’re looking for the two matches Atleti and Sevilla played each other, the two matches between Granda and La Real, and the two matches between Eibar and Valladolid. After we have matches, we’ll summarise points and goal differential by the points-group.
Finally, we join the tie-break table to the final table, arrange the teams by the sort criteria, and remove unnecessary columns.
Venga vamos! Tenemos la final tabla de la temporada! We have a final league table for the season. Now what’s the use doing this once? Let’s create a function that matches for a La Liga season and produces the end of season league table.
Let’s test it for the 2017/18 La Liga season:
Ahh, there was a unique case this year where three teams finished with 49 points; and by checking the final league table our function handled these ties correctly. I’m already looking forward to the next La Liga season.
I didn’t address the final tie-breaker: fair-play points. This would take a little more time to incorporate seeing as 538 doesn’t include data on discipline (yellow/red cards, fouls, etc).