The Math Behind Premier League Score Prediction
The Math Behind Premier League Score Prediction
I recently developed an Excel spread sheet with macros that simulates the Premier League table. This is quite an interesting process. I takes me some time to figure out the math behind Premier League score prediction. The overall predicting steps are well documented by other sources. However, most of the details, especially detailed algorithms, are not widely discussed or documented. I have to come up with my own methods or algorithms. I will document these details here. Hopefully, it may be useful for me or others in the future.
Before you proceed, you may want to download the spread sheet here. Most of the terms and concepts discussed below are used in the spread sheet. The spread sheet is written with Excel VBA macros. It runs in Office 2010 and newer.
I also post a weekly EPL score prediction. This page is a summary of how good my predictions are.
Introduction to SPI (Soccer Power Index)
The first step toward predicting the Premier League is to set up the Soccer Power Index (SPI) for each team in the league. Luckily, the folks at fivethirtyeight.com have already done it. They post updated SPI values for each team in the Premier League on their website.
SPI is a pair of numbers that measures a team’s offense and defense strength. The attacking index is the expected number of goals a team scores over an average team in the league. The defending index is the expected number of goals a team concedes to an average team in the league. Detailed explanation of SPI and high level score estimation procedure is documented here.
Adjust League Average Based SPI to SPI Against Opposing Team of a Match
The next step is to convert SPI to the number of goals a team is expected to score or concede over its opponent which may be stronger or weaker than an average team in the league. I used the following formulas:
offhome_adj = Offhome + (Defaway - Def_ave)
defhome_adj = Defhome + (Offaway - Off_ave)
For example, the first game in the 2018/19 season is Manchester United vs Leicester City. The SPI of Man U is 2.2(attacking), 0.5(defending). The SPI of Leiceter City is 1.9 (attacking), 0.8 (defending). The average SPI of all teams in the Premier League is 1.97 (attacking), 0.75 (defending). To calculate the attacking index of Man U (the home team), we take the difference between the defending index of Leicester City (the away team) and League average, which is the extra goals that Leicester City may give up against the average team in the league, and add it to the Attacking index of Man U. This assumes that Man U can capitalize on the goals that Leicester’s defense is likely to give up. In this case, we get
offhome_adj= 2.2+(0.8-0.75)=2.25.
We calculate the Defending Index for Man U. in the similar way.
defhome_adj= 0.5+(1.9-1.97)=0.43.
This strengthens Man U’s defending index, making them conceding fewer goals, since Leicester’s attacking index is slightly below League average. Adjusted SPI for Leicester City (offway, defaway)=(1.65,1.03)
is calculated in the same way.
Obtain Projected Score of a Match
Once we have the adjusted SPI values, we can go on to find out the projected score of a particular match. For this I simply average the adjusted attacking index of the home team and the adjusted defending index of the away team to get the projected number of goals scored by the home team (spiGF
). The number goals the away team is expected to score (spiGA
) is calculated in the same way.
spiGF = (offhome_adj + defaway_adj) / 2
spiGA = (defhome_adj + offaway_adj) / 2
Again, take Man U vs Leicester City as an example, the projected score is
spiGF = (2.25 + 1.03) / 2 = 1.64
spiGA = (0.43 + 1.65) / 2= 1.04
Scale to The Number of Average League Goals
In order to generate more realistic scores, after we calculate projected GF and projected GA for all teams, we scale the number of projected goals to the number of average league goals. For this simulation, we assumed that number of league average goal is 2.85 per game.
Consider Other Factors: Home Field Advantage. Days of Rest before a Game. SPI Adjustment Factor.
For home field advantage, I assume the home team can score an extra 10% more goals.
I also assume that a team will give up an extra goal if they have zero rest days before a game and a team will not give up any extra goals, if they receive full 7 day rest.
In the case of Man U. vs Leicester City, projected after adjusting for home field advantage is 1.80:1.04
.
The SPI adjustment factor (SAF) allows the user to override the given SPI values. It’s a whole number between -15 to +15. SAF=0
will not alther the SPI values or the projected game score. If you set SAF to a number larger than 10, you literally can boost a team from bottom of the table to top of the table. If you set SAF to a number less than -10, you can sink a top team to the bottom of the table. A moderate value between -5 to +5 is recommended.
Poisson Distribution.
The projected game score is the most likely outcome of the game. But in reality, anything can happen. We must calculate the probability of every possible score. It turns out Poisson Distribution closely describes the goals are scored in a soccer match.
P = (Lamda ^ k * Exp(-Lamda)) / Factorial(k)
Lamda
is the number of projected goals of the match. P
is the probability of having k
number of goals in that match.
To qualify for Poisson Distribution, a soccer game must have the following characteristics:
- k is the number of goals that can be scored. k =0, 1, 2, …
- The occurrence of a goal does not affect the probability that a second goal is scored. That is, goals occur independently.
- The rate at which goals are scored is constant. The rate cannot be higher in some intervals and lower in other intervals.
- Two goals cannot be scored at exactly the same instant.
- The actual probability distribution is given by a binomial distribution and the number of trials on goal is sufficiently bigger than the number of goals scored.
I would not say that the above is a perfect description of a soccer match, but I guess it is close enough.
Below is table of goal probability of the Man U. vs Leicester City match.
As an example, the very top row in red shows that Man U has 26.8% of chance of scoring two goals. The very left column in blue shows that Leicester has 36.8% of chance of scoring one goal. Combined, there’s a 9.8% of chance of having a 2:1 score. This is the most likely score, since it has the highest probability of all possible scores. Incidentally, 2:1 is the actual score of the match.
Also from the table, all the green cells add up to the probability (55.0%) of Man U. winning the game. All the orange cells add up to the probability (11.7%) of Leicester City winning the game. All the yellow cells add up to the probability (23.1%) of a draw.
Time to Throw the Dice
So what’s the final guess of the score? It’s time to throw the dice. At this point, the program generates two random numbers, one for Man U. and one for Leicester. Depending on the value of the random numbers, we guess the final score. Although 2:1 is the most likely score, there’s only 9.9% of the time we can get that score.
Auto Fill Feature
Let’s go back to the Excel program. The “Auto Fill” function executes the above sequences for all fixtures. It runs through the entire season once and generates the “guessed scores” of every game. The League table in the “Table” tab is updated using the “guessed scores”. Keep in mind, this is the result from the throw of a dice, the “guessed score” may not always be the “projected score”.
A few things about Auto Fill:
- It will only update the empty cells in the score columns in “fixture” tab.
- It will show the projected scores, the win, draw, loss probabilities of the home team and the dice values in the neighboring columns.
- It will not erase the scores that are manually entered.
Adjust SPI Based on Game Results
To make the simulations more realistic, as Auto Fill runs through the season, the SPI values of each team is updated based on the guessed results. For instance, if a weak team somehow beat a strong team, the SPI of the weak team is raised. The SPI of the strong team may be lowered.
In the program, we only updates the SPI, if there is a difference of 1.7 goals between the “projected” goal and the “actual” or “guessed” goal. For every 1.7 goal difference, we boost the SPI value by 0.1. The offensive SPI caps at 3.2 and bottoms at 0.5. The defensive SPI bottoms at 0.1 and caps at 2.
Monte Carlo
To predict the team standings in the table, we need to run “Monte Carlo simulation”. We basically run the “Auto Fill” process many times, say 10,000 times. We then calculate the average points and average goal difference of the teams from the 10,000 tables. The teams are then ranked based on the average points and average GD.
Below is my Monte Carlo simulation results. It was run on December 24, 2018.
Rank | Club | SPI Off | SPI Def | GD | Pts | Win League | Top 4 | Rel. | |
---|---|---|---|---|---|---|---|---|---|
1 | Liverpool | 2.8 | 0.3 | 59 | 92.6 | 61% | >99% | <1% | |
2 | Man City | 3.1 | 0.4 | 62 | 89 | 33% | 99% | <1% | |
3 | Spurs | 2.5 | 0.5 | 36 | 82 | 6% | 93% | <1% | |
4 | Chelsea | 2.4 | 0.4 | 35 | 75.6 | <1% | 68% | <1% | |
5 | Arsenal | 2.4 | 0.7 | 25 | 71.6 | <1% | 35% | <1% | |
6 | Man Utd | 2.3 | 0.7 | 13 | 63.2 | <1% | 4% | <1% | |
7 | Watford | 2.0 | 0.8 | -2 | 53.6 | <1% | <1% | <1% | |
8 | Everton | 2.0 | 0.8 | -2 | 51.3 | <1% | <1% | <1% | |
9 | Leicester City | 1.8 | 0.7 | -1 | 50.6 | <1% | <1% | <1% | |
10 | Wolves | 1.7 | 0.7 | -6 | 49.7 | <1% | <1% | <1% | |
11 | West Ham | 1.9 | 0.9 | -6 | 49.3 | <1% | <1% | <1% | |
12 | Bournemouth | 1.9 | 1.0 | -9 | 48.2 | <1% | <1% | <1% | |
13 | Crystal Palace | 1.8 | 0.8 | -11 | 43.8 | <1% | <1% | 3% | |
14 | Brighton | 1.6 | 0.9 | -17 | 41.4 | <1% | <1% | 4% | |
15 | Southampton | 1.9 | 0.9 | -18 | 39.1 | <1% | <1% | 12% | |
16 | Newcastle | 1.6 | 0.9 | -20 | 36.9 | <1% | <1% | 20% | |
17 | Cardiff City | 1.6 | 1.1 | -34 | 32.4 | <1% | <1% | 50% | |
18 | Burnley | 1.6 | 1.1 | -34 | 30.2 | <1% | <1% | 64% | |
19 | Fulham | 1.6 | 1.1 | -39 | 28.9 | <1% | <1% | 72% | |
20 | Huddersfield | 1.5 | 0.9 | -32 | 28.4 | <1% | <1% | 75% | |
Rank | Club | SPI Off | SPI Def | GD | Pts | Win League | Top 4 | Rel. |
As the season progresses, I expect each team will change and grow. The table will get more interesting. From time to time, I will post updated table predictions here.
May your favorite team win!
14 Comments
Mr Soccer
Dear jtang,
You are a true Excel genius for me :-) A huge compliment for your work. I would like to contact you directly by e-mail to ask your question, to which address may I send my message ? Many greetings.
jtang
You can reach me at excel4soccer@gmail.com or via twitter @Excel4Soccer.
Zarko ítalo Rodríguez Peña
Can I buy the Excel to create my own league? I’m from Chile and this interactive table was amazing. I would like to buy for my, please
jtang
Sorry not at this time. At this point, creating a league from scratch is mostly a manual process. I also don’t the bandwidth to provide support.
MH
Hi,
update: You can ignore 3rd question, I understood now how to use the passion distribution equation to get percentages mentioned in the table.
I applied below equation to get first row and first column in table
P = (Lamda ^ k * Exp(-Lamda)) / Factorial(k)
Lamda: is projected goal after adjustment ( 1.8 for home team & 1.04 for away team in our example)
Then, by using below concept of probability, I can get the other values in table
P(A ∩ B) = P(A) x P(B)
Awaiting your reply about 1st & 2nd questions, still i cannot understand how you get the league average goal as 2.85 and for what calculation would you use as input ?
Thanks.
jtang
Hi MH,
1. The average goals per game is based on historical data. This number can changes average season. If you are simulating the 2019/2020 PL season, I suggest you use the 2018/2019 season data.
2. Perhaps I wasn’t clear in the article. 2:1 is the most likely score if Man U wins the game. 1:1 is the most likely score if the game is tied. 0:1 is the most likely score if Man U loses the game. For this game, Man U and 55% chance of winning. So 2:1 is the most likely score.
MH
What a great effort !
I have only some questions here seeking your answer please,
1- Based on what you assumed that number of league average goal is 2.85 per game ?
2- I can see 1:1 score is having highest probability with 10.9% from the table. Why did you select 2:1 as the most likely score ?
3- How can we calculate the probability percentage of each event in the goal probability table ? I tried to calculate it based on the passion distribution equation but I didn’t get same result. if you can mention an example for one field in the table, I will be grateful.
Thanks.
paulp
what a find great work planning on doing any more looking at excel at the moment for predicting draws
Bruno
Very interesting. I created a simulation with similar concept but minute by minute. Check it here: https://github.com/bruno82ricardo/brfoot
jtang
Hi Michael,
This sometimes happens when the team logos are being dynamically copied to the updated league table. I can usually clear this error if I reboot the computer and restart Excel.
If this problem does not go away for you, please shoot me a message in Etsy. I will refund your purchase.
Thanks!
Michael
I have bought version 6.4 and unfortunately the table does not update correctly, only the first eleven teams. Also when I try to update the table I get Visual Basic window pop up with the message Run-time error ‘1004’ Paste method of Worksheet class failed.
Pingback:
jtang
Hi,
Sorry for the inconvenience. I just uploaded a new version (Rev 6.2). It is fixed in the new version.
Graham
Great site and wonderful download. However, after purchasing and downloading the most recent revision, I find that I cannot change the number of runs in the Mote Carlo analysis for the predicted table as you suggest. It keeps asking for a password to “unprotect”.