On Friday, 11 June, Europe’s men’s football teams started the European Championship – a year later than planned. The favourite this time is France with a probability of winning of 14.8 per cent.
This is what an international team of researchers consisting of Andreas Groll and Franziska Popp (both TU Dortmund, Germany), Gunther Schauberger (TU Munich, Germany), Christophe Ley and Hans Van Eetvelde (both Ghent University, Belgium), Achim Zeileis (University of Innsbruck, Austria) and Lars Hvattum (Molde University College, Norway) has shown with the help of machine learning.
Their forecast combines several statistical models for the teams’ strengths with information about the team structure (such as market value, number of Champions League players, club match performance of individual players) as well as socio-economic factors of the country of origin (population and gross domestic product).
With the predicted values from the researchers’ model, the entire European Championship was simulated 100,000 times: match by match, following the tournament draw and all UEFA rules.
This results in probabilities of all teams advancing to the different tournament rounds and ultimately winning the European Championship. The favourite this time is France with a probability of winning of 14.8 per cent, followed by England (13.5) and Spain (12.3).
Of course, the tournament is not over – this is also shown by the relatively narrow gaps in the win probabilities at the top, plus of course the already low probability even of the top nations.
“It is in the nature of forecasts that they can also be wrong – otherwise football tournaments would be very boring. We provide probabilities, not certainties, and a probability of winning of 15 per cent also means a probability of 85 per cent of the team not winning the tournament,” says Achim Zeileis.
So far, however, the predictions have been quite successful: Achim Zeileis’ Innsbruck model, which is based on adjusted odds from the betting providers, was able to correctly predict the Euro final in 2008, as well as the World and European Champions Spain in 2010 and 2012, among others.
This year, it will be used as part of a more comprehensive combined model developed by the teams led by Andreas Groll (TU Dortmund), Gunther Schauberger (TU Munich) and Christophe Ley (Ghent University), which surpassed the forecasting quality of the betting providers at the 2018 World Cup.
Germany in the tournament
It is no secret that the German national team has been drawn into a particularly challenging group this year.
“There are three very strong teams in Group F, including the current world champion France and the European Champion Portugal, both also finalists at EURO 2016, plus Germany,” explains Andreas Groll.
“Compared to the top teams in the other groups, the probability of making it to knockout stage is lower in this group. But those who do make it have a very good chance of advancing further.”
The forecast sees a probability of 85.3 per cent for both Germany and Portugal to make it to the round of 16; for France, the probability is slightly higher at 89.7 per cent. Germany’s probability of becoming European Champion is 10.1 per cent, well below that of the favourites and exactly on a par with Portugal.
The researchers’ calculation is based on four sources of information: A statistical model for the strength of each team based on all international matches of the past eight years (Ghent University), another statistical model for the strength of the teams based on the betting odds of 19 international bookmakers (University of Innsbruck), further information about the teams, for example market value, and their countries of origin, such as population size (TU Dortmund and TU Munich), as well as detailed ratings of the individual players and their individual performances both in their home clubs and national teams (Molde University College).
The fifth source or “partner” is a machine learning model that combines the other four sources and optimises them step by step.
The researchers trained the model on historical data, as Andreas Groll explains: “We fed the model with the current data for the past four European Championships, i.e., between 2004 and 2016, and compared it with the actual outcomes of all matches in the respective tournaments – so the weighting of the individual sources of information for the current tournament will ideally be very precise.”
In any case, we will find out how well the model performed by the evening of 11 July at the latest.