‘He quoted expected goals!’ says Jeff Stelling in disbelief. It is 2017 and the Soccer Saturday host is responding to post-match comments from Arsène Wenger: ‘He’s the first person I’ve ever heard take any notice of expected goals, which must be the most useless stat in the history of football! What does it tell you? The game’s finished 3-1, why do you show expected goals afterwards?’[1]
Despite the irritation it has caused TV pundits, expected goals (commonly abbreviated as xG) has become a common part of football analysis in recent years. The stat regularly appears in post-match media coverage and is cited in articles by journalists and bloggers.
Whereas post-match stats such as the number of shots or the number of fouls are simple to understand and can be counted by anyone watching the game, the expected goals metric is more complicated and uses a large range of variables. The way these variables are analysed and interpreted to create the expected goals stat is an example of how the complex use of data analysis is becoming a mainstream part of the modern game for everybody, from fans to managers.
Essentially, expected goals is a calculation of how many goals each team or player ought to have scored in a game. It measures the quality of each goalscoring chance created during a match, using historical shot data to determine the exact likelihood of that chance resulting in a goal.
Based on the likelihood of it resulting in a goal, each shot is given an ‘expected goals value’ (EGV). A shot that has been calculated as having a 75 per cent chance of resulting in a goal has an EGV of 0.75. Adding together the EGV for each shot in a game provides the total xG for each team.
A team that wins a game with a significantly lower xG than the other side may have been lucky or won because of rare moments of individual genius rather than tactical domination. Adding together the EGV for each shot a player has over the course of the season gives an indication of how well they are performing. A striker whose goal tally is lower than his xG might well need replacing with a better player.
The value of the statistic is that it provides a meaningful way to calculate just how well a team or player has performed. There are lots of times when a fan would argue that their side was unlucky not to win a game, or a pundit would say that a striker has not been scoring due to lack of chances. Expected goals shows us whether the data confirms these observations or not.
The concept of expected goals originated in the work of researchers and scientists attempting to analyse football, rather than from the work of pundits or coaches. The term itself dates back to 1993. It was first used by Vic Bennett and Sarah Hilditch in their scientific paper ‘The Effect of an Artificial Pitch Surface on Home Team Performance in Football (Soccer),’ published by the Journal of the Royal Statistical Society.
The basic idea was used by other researchers, some looking at football, some looking at ice hockey. In 2004, Jake Ensum, Richard Pollard and Samuel Taylor produced two papers looking at data from World Cup matches aiming to quantify the role that factors such as the distance from goal, the angle of the shot, and proximity to the nearest defender had on whether a goal was scored. In the same year, Alan Ryder produced a study of expected goals in America’s National Hockey League. His study begins with a line that memorably encapsulates the idea behind expected goals: ‘Not all shots on goal are created equal.’[2]
In 2012, Sam Green, working for the sports statistics company Opta, produced an influential blog post explaining the idea of expected goals and how it could be used.
He analysed data from the 2011-12 English Premier League season. One of the conclusions he reached was that Luis Suarez, then completing his first full season for Liverpool, had been ‘especially unfortunate in front of goal,’ scoring only 11 goals due to bad luck rather than lack of skill.[3] The truth of this observation was shown in the following seasons. In 2012-13, Suarez scored 23 league goals and the following season he scored 31.
Following Green’s article, the idea began to gain more traction and mentions in newspapers such as The Guardian. Online, football bloggers such as Michael Caley, whose motto is ‘Bringing baseball stat nerdiness to football,’ and Sander IJtsma, better known as 11tegen11, began to take an interest and develop their own xG metrics.
It was in August 2017, at the start of the 2017-18 English Premier League season, that the idea really became mainstream, when BBC’s Match of the Day began using the statistic.
For the Match of the Day editor, Richard Hughes, including expected goals ‘seemed like a natural progression – something new and innovative.’ Talking about its inclusion on the show, he explained that using expected goals was not intended to replace or influence what the pundits were saying, but instead, showing the data ‘backed up the points they are making.’[4]
The expected goal statistics are provided to Match of the Day by Opta, who also provide data to SkySports and BT Sport. Since 2013, Opta have been the official data partner of the English Premier League and their metric is dominant. However, just as people such as Michael Caley have worked on their own metrics, other sports data companies have also developed their own models for calculating expected goals.
Driblab is one such company. They started their own metric for expected goals in 2013 and have created maps that visualise the data, using circles to represent where each shot was taken on the pitch and varying the size of the circles so that the larger the circle, the higher the EGV. StatsBomb, FootballXG and Understat also provide xG statistics.
Because there are so many variables that can influence whether or not a shot is likely to result in a goal, models can use contrasting parameters and record or emphasise different data and thus come up with an alternative EGV.
Judging the quality of a chance is the complicated part. There are many variables that can influence how likely a chance is to result in a goal, such as the angle of a shot, the position of the goalkeeper or the distance a player is from goal. Although it is clear to anyone who watches football that an unmarked player shooting from six yards out is more likely to score than someone shooting from forty yards out, providing an exact figure for the likelihood of each shot resulting in a goal requires an in-depth analysis of the data.
To create their metric for expected goals, Opta analysed over 300,000 shots. When analysing these shots, they took into account factors such as:
The different variables have an impact on each other. A shot on goal from the penalty spot is more likely to result in a goal if the player is taking a penalty than if it is a header from a corner. In this case the data about where the shot is taken from is the same, but other variables are altering the likelihood of a goal being scored.
As more data becomes available about football, more variables can be added to the calculation. Now that players at all the top clubs wear GPS vests, it is easier to track exactly where players are on the pitch. This means that data about the position of each defender on the pitch when a shot is taken can be analysed for its influence on whether a shot is scored. When StatsBomb began to include this data, they found that it substantially improved their xG calculations.
Because of the competition between different companies, information about the exact data each one analyses and the precise metrics they use is not publically available. Using different variables means each company produces slightly different expected goals values for each team and player, although they tend not to wildly differ.
There are obvious uses for expected goals. It is a simple way of seeing which team had the most chances to win the game or which strikers are putting away their goal scoring opportunities.
Knowing how well a striker is converting their chances allows a useful comparison between strikers at clubs who create a lot of high quality chances and strikers who might be feeding off scraps at a club near the relegation zone.
For example, in the 2018-19 English Premier League season, Wilfred Zaha scored 10 goals for Crystal Palace, a long way behind the league’s top goalscorer, Pierre-Emerick Aubameyang, who scored 22 goals for Arsenal. However, when we look at expected goals (as calculated by StatsBomb), Zaha’s xG was only 6.5, whereas Aubameyang’s xG was 20.7. Therefore, although he scored less than half as many goals as Aubameyang, the stats suggest that Zaha was better at taking his chances.
Knowing that a team is creating more chances than their opponents even if they are not winning games can suggest that a team is perhaps going through a patch of bad luck and could soon pick up form.
At the start of the 2015-16 Italian Serie A season, reigning champions Juventus only won three of their first ten matches. But the expected goal data showed that the team’s goal difference ought to have been much better.
‘To the naked eye they were struggling, but xG was identifying a team that would improve soon,’ observed Duncan Alexander, who works for Opta as their Chief Data Editor.[5] ‘The Turin side had scored 11 goals in those 10 games, when their xG was 19. At the other end they had leaked nine, when expected goals suggested it would usually have been five. Looking at those numbers, we expected things to regress to normal and, lo and behold, the Old Lady’s luck changed. In fact, they won their next 15 Serie A matches on the way to winning another title.’{6]
Expected goals does not function as a prediction of exactly how many goals a team will score or concede, but it does make it possible to see when a team’s results do not match their level of performance. In such cases, it will often be the case that the team in question soon starts achieving the expected results.
Although the focus of the stat might seem to be on attacking play and the performances of strikers, expected goals is also useful for evaluating how well a goalkeeper is playing. If, over the course of a season, a goalkeeper concedes fewer expected goals than other goalkeepers, it would suggest that he is making a lot of important saves.
Using statistics in this way allowed BBC sportswriter John Stanton to predict in November 2017 that Burnley goalkeeper Nick Pope could be a serious contender for a place in the England national team. Comparing him to other keepers, Stanton found that he had prevented 5.9 expected goals, whereas in the same period other English keepers such as Jordan Pickford at Everton and Ben Foster at West Brom had conceded more than xG suggested they would.[7]
Nick Pope went on to be called up the England squad for the first time in March 2018, making his debut later that year. He has gone on to play seven times so far for the national team and has fully established himself as Burnley’s number one. The data was not a fluke.
Knowing exactly what kind of shot is most likely to lead to a goal is very useful information for a manager. Since the earliest days of competitive football, managers have tried to create tactics that allow their team to be effective in converting their chances. Expected goals allows a far more precise understanding about what works and what does not compared to traditional observations and hunches.
In attack, the team can focus on creating more of the kind of chances that are likely to result in a goal and shoot less often from other positions. This is important not only for scoring goals, but because a missed shot often gives possession of the ball to the other team. Likewise, in defence the team can limit the chances it gives to the opposition by better defending key areas of the pitch.
Managers rarely talk in detail about their tactical innovations, so working out the impact that xG is having on tactics involves analysing the data and seeing changes that could be linked. For example, a shot made closer to the goal tends to have a better EGV than one taken further away.
This has perhaps influenced the fact that in 2015-16 Manchester City’s average shot distance was 18.8 metres from goal, but by the 2019-20 season, their average shot distance was 16.6.
Shooting closer to goal might also be part of the reason for West Ham United’s unexpectedly high position in the 2020-21 season of the English Premier League. The data analyst Toni Bilandzic has shown that compared to every other Premier League team, West Ham are ‘the only team whose average shot distance is less than 16 metres.’ For Bilandzic, this is ‘one of the reasons for their highly efficient direct football, as they are taking shots from more dangerous positions.’[8]
With managers keeping their cards close to their chests, we cannot know how much xG has directly influenced the tactics of Man City and West Ham, but at the very least xG provides a framework in which people can scrutinise how a team is improving the quality of the chances they create.
The expected goals stat is never going to be a perfect tool to predict what will happen. There will always be times when a player misses an open goal or a goalkeeper lets a long-distance shot slip through his legs. As Jeff Stelling pointed out in his rant about the statistic, the most important stat at the end of the game will always be the actual score. A team that loses 2-0 in a cup final cannot walk away with the trophy because they have a better xG.
But when looking at longer term trends, the stat is proving useful. It can predict when a team or a player is under or overachieving. Given time, the performance levels tend to revert to the level which matches the expected goals.
Because of the number of variables that can influence whether a shot results in a goal, the way in which xG is calculated is likely to undergo revision and refinement, especially because of the competition between companies offering their own versions. The stat is going to continue to become more sophisticated and more accurate.
The concept behind expected goals is also starting to be used for other parts of the game. Data companies are now producing statistics on expected assists as a way of measuring how well players and teams create chances. Opta have also been working on a way to model defensive coverage to balance out the focus on analysing attacking play.
Expected goals has shown that to best understand the game, data needs to be combined and not treated in isolation. Merely counting the number of shots or the number of corners is no longer enough. Metrics are here to stay.
[2]https://hockeyanalytics.com/Research_files/Shot_Quality.pdf
[3] https://www.statsperform.com/resource/assessing-the-performance-of-premier-league-goalscorers/
[4] https://www.fourfourtwo.com/features/no-seriously-what-heck-expected-goals-xg
[5] https://www.bbc.co.uk/sport/football/40699431
[6] https://www.fourfourtwo.com/features/no-seriously-what-heck-expected-goals-xg
[7] https://www.bbc.co.uk/sport/football/41822455
[8] https://totalfootballanalysis.com/data-analysis/west-ham-20-21-data-analysis-statistics
Share this article
AnalyiSport is for everyone who is passionate about analysis in football. Where are you in your development journey?
As more clubs than ever look to build data into their recruitment process, an understanding of recruitment analysis is your ticket to success in the game.
Our team provides news and insights from the cutting edge of football analysis.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |