For several decades, and particularly in the last five years, even in the most practiced, well-known, and followed sports in Europe, like football, statistical analysis has become a valuable support tool for those who judge and observe sporting events.

Although there is an attempt to avoid the utopian pursuit of interpreting football reality through a single number (KPI), which would be easily consultable and inherently transparent, it’s worth highlighting that the search for an optimal model capable of effectively predicting the sporting performance of a team or an individual player is increasingly present.

At the same time, when comparing the use of statistics in sports like basketball, where accurate models for individual players are quite common in globally appealing leagues like the NBA, the application of these tools in the football world is a step behind.

The appeal of the idea that a single statistic could reveal a team’s offensive potential is undeniable, even beyond the imponderable: beyond more or less unfortunate periods, or difficulties posed by the match schedule.

One of the most widely used models for predicting match results, the xG model, uses historical information from thousands of similar shots to estimate the probability that a goal will occur on a scale between 0 and 1.

Despite their accuracy, Expected Goals (xG) are not without criticizable and frustrating limitations for those wanting to analyze an event as complex as a football match, with twenty-two actors on the stage and the contested object—the ball—difficult to control with statistics, however advanced they may be.

The first and main problem is that xG takes into account the actions that contribute to creating the shot but completely ignores the relative positions of the 22 players on the field at the moment of the shot. This is simply because the spatiotemporal placement of players is not predictable.

Foto di Mitch Rosen su Unsplash

Why are xG Important?

They are important because they are the most accurate predictor of future performance for teams and players. At the team level, the predictive effectiveness of Expected Goals models is greater compared to other indices, such as the actual goal difference and simple shot counting metrics like Total Shots Ratio (TSR). xG models allow us to look beyond current results to get a more precise idea of the underlying quality of teams and players.

How Were xG Developed?

Goals are the most important events in football, but they are also the least frequent. In most leagues, there are only 2.5-3 goals per match. This is why more attention has been given to shots, which occur at a frequency 10 times greater than goals (about 25-30 per match).

From this emerged metrics like the Total Shots Ratio (TSR), which measures team dominance based on their share of shots in matches. However, it must be said that not all shots are equal. This created the need for a specific method to measure the quality of a particular shot or series of shots, and thus xG models were born.

Expected Goals models, despite their limitations, have proven to be an important tool, especially for highlighting trends in the early stages of a season.

To optimize this tool, various strategies have been developed.

On one hand, there have been attempts to add ingredients to the model, providing more information to the algorithm that calculates the probability of a shot’s success. For example, data related to the chain of passes that generated the shot has been added to the algorithm, not just information about the final assist.

On the other hand, there have been attempts to create a shooter profile, including data on shooting accuracy maintained during the current season or the previous one.

Moving to a historical analysis, we know that Charles Reep was the first to develop a statistical model applied to football. In his idea, increasing the quantity of long balls proportionally increased a team’s offensive threat.

Photo by Jason Charters on Unsplash

The meaning of models

But why build a model? Because, ultimately, football is an extremely complex sport, whose outcome depends on a sporadic event like scoring a goal.

Football is a sport with a peculiar structure: it’s not a situational sport and, if we exclude set pieces, actions follow one another while the clock continuously runs. We must also consider that the development of the game is strongly influenced by the relationship that exists among a higher number of players than in other sports.

In short, models of reality are created precisely to reduce its complexity. Reduction to a finite number of variables, however, involves choices: each component must be weighted, included, or discarded. It should be emphasized that behind every choice there is a personal idea about what has more or less significance in the game.

The Expected Goals model is therefore subjective by construction.

And it leads us to ask questions: how important is it that the shooter used their preferred foot or the wrong one? What difference does it make if the assist came from a cross or a through ball?

Including one factor and ignoring another is a choice that model builders make based on their knowledge and perceptions of the game.

This is why it’s important to carefully choose which statistics to associate with each other. It’s an act that entails responsibility toward the reader: simply put, by choosing the data that was most convenient for the purpose, we could have exalted the striker or thrown him into the dust.

On the other hand, statistics are here to stay, and we must learn to live with the analytical approach to sports analysis, which will be increasingly present in media language.

Certainly, new tools will emerge and old ones will be refined, allowing us to compare past performances and make predictions about future ones. But we must also become critically aware that behind every system of Performance Indicators, there is also the interpretation of those who present them.

Numbers don’t lie, but people sometimes do

There is much expectation regarding “tracking data” that could add new factors and, consequently, increase the precision of various models, but there is still no publicly available data, and it’s also possible that any improvement would only be marginal.

Meanwhile, statistics are increasingly used in the media, also thanks to interest in games built around a predictive aspect (such as Fantasy Football or Football Manager).

Especially for new generations, the acceptance of a numerical description of players and teams now seems deeply rooted.

Statistics are becoming more numerous and more accepted, but it still takes time before a more reasoned type of analysis, capable of capturing nuances, becomes widespread. It takes time, as well as a certain level of expertise, to be able to read the authentic meaning in football statistics: things that are often lacking in the way the media use them.

But this problem also exists within professional clubs, where performance analysts are introducing statistics into their work, often without truly understanding what is really important and what is not.

The use of graphics has also increased in the media, but, for example, maps of average positions or Heat-Maps (among the most widespread) are rarely capable of telling us what really happened on the field; and their popularity has also caused their misuse.

Despite this, we must not forget that shot maps (with or without xG values) or even graphs highlighting specific passes or created chances can reveal significant truths about a game, a team, or a player.

In all of this, a principle must be clear: a statistic should only be presented if it is capable of adding an interpretive key to understanding an event. If it is able to reveal hidden truths that are accessible and quick to understand.

There is certainly work to be done from every point of view. Only team interaction between visualization experts, narrators, and analysts can raise the overall level and allow a rational and truly useful use of data.

In a 1961 experiment, Bugelsky and Alampay showed an ambiguous image to a group of individuals, one open to different interpretations: a drawing that could represent the face of a man or a mouse. It turned out that observers saw different things depending on their perceptual set, that is, based on other images that had been previously shown to them.

We must necessarily rely on our observational capacity: but sometimes it’s good not to trust it too much.

The next step will probably be an awareness on the part of the spectator of their own limitations and responsibilities. An interpretation remains such, after all, even if illustrated with a graph or numbers. But it is undeniable that a greater availability of “material to interpret” can only enrich the knowledge of a game that is at the same time simple but rich in facets.

A Look Outward

The idea that Football Analytics have interrupted their evolutionary process after the spread of various Expected Goals models is, at least in part, agreeable. xG was an enormous step forward compared to the public shot models available until that moment, but it is equally true that they are not perfect.

Since xG became established, analysts have continued to circle around them without developing anything equally sensitive. In this sense, I believe that tracking data – public or private – can give a further impetus to the models that our small community develops.

For example, in the NBA, the American basketball league, tracking technology has now even become part of the public domain. And the mere spread of this type of data has enormously increased the knowledge of NBA analysts.

There has been a widespread diffusion of models such as “quantified shots quality” (qSQ) – similar to the concept of xG, but that takes into account the distance from the nearest defender.

Models that, developed relatively quickly, have allowed all of us to better understand the game and analyze players, coaching, strategy, and team needs in more depth.

Adding tracking data would help not only to create better and more accurate xG models but also to better quantify possession, pressure, pass quality, and so on.

Other implementations and ongoing studies see how the possible choices that a footballer can make from time to time are modeled with the neural network technique.

The proposed approach is extremely complex from a mathematical point of view, but the richness of spatiotemporal information that can be obtained about the game is priceless. Each individual player, when in possession of the ball, can be evaluated in their ability to generate value, compared to the risk they are able to sustain.

Another example of a notable solution that has gained traction was given by Karun Singh, a Data Scientist who works for Arsenal Football Club. He divided the field into 192 discrete zones, and in each of these evaluated four components: the probability that a pass will be made; the probability that a shot will be taken; the danger created possibly by a pass; finally, that created by a possible shot.

So, Karun also includes an Expected Goals model, and with the same technique evaluates the other components, giving us the idea that by modeling the behavior of a professional footballer, zone by zone, given past events, a further important step forward can and will be made in this field.

In light of what has been said, we can affirm how it is possible to effectively and fruitfully apply statistics to the world of sport and especially to football, even in a perspective at a higher level of improvement in the management and growth of Clubs, and the diffusion of tools to simplify understanding of the individual game.

In the belief that the football system is endowed with many “inefficiencies” that can be exploited, these reflections show how the traditional methods of company administration, adopted by the majority of teams, are often obsolete and can definitely be improved.

Therefore, the use of statistical data can only bring an evident and desirable benefit, allowing even small companies to try to compete and overcome opponents with much greater resources.

Author: Andrea Di Giulio | DMBI Data Analyst

Photo by Lesly Juarez on Unsplash

Related content

DMBI consultants

via Candido Galli, 5 – Frascati
00044 – Roma
info@dmbi.org
Fax | Tel +39 06 9422 421
Part. IVA 09913981008