As the weather warmed up this summer, we took time to enjoy the latest iteration of the largest single-sport tournament on earth—DataGenius style. After inviting all of you sports and or data fans to join us, the DataGenius team analyzed the data contained in our #PredictTheKick starter kit, built from qualification matches and international friendlies. By using SAP Analytics Cloud, we were able to unearth hidden insights—and even make some predictions.
Choosing the winners to any tournament isn’t easy (see our basketball tournament results), but it can definitely be fun! Here’s what our team came up with for the results of the group stage:
We started by using one of SAP Analytics Cloud’s embedded machine learning features, Smart Discovery, to help us determine what influences the win potential of the teams the most. Before we even created a single chart, SAP Analytics Cloud showed us that the main influencers of the teams’ win percentage were: goals against average, goals for average, and shots on goal.
Our starter kit included cumulative “total” stats from all matches played as well as averages, but because every region’s qualifying process is different, using totals is less useful since some teams played more frequently than others. Instead, we used the per match averages to give us a better understanding of the teams’ typical performance. Later, we experimented with total measures instead and it created outliers that were heavily affected by how many matches the teams had played.
All the measures we selected for our analysis were straightforward and easy to understand. For example, teams that achieved more ball possession had a higher chance of obtaining a goal, and teams with a larger number of shots and shots on goal also had a better chance of actual scoring a goal.
In football/US soccer, draws where no winner or loser is determined are fairly common. Therefore, even if the teams have a low “Goals For/Against Average,” it doesn’t necessarily mean that they are a poor team.
The heat map visualization in our story shows us the influence of “Goals Against” and “Goals For/Scored” on a team’s potential to win matches. We noted that each decrease in “Goals Against” (moving from bottom to top) lead to a more significant winning potential than an increase in “Goals For/Scored” (moving from left to right). This means that more emphasis should be put towards Goals Against instead of Goals For/Scored when evaluating a team’s chance of winning.
Although some teams had metrics that were objectively lower when compared to others, we had to consider the strength of opponents that were played leading up to the tournament. There were also external factors, like injuries and players news, which played into our decisions that were not explicitly captured in our data set.
In Our Sample Group: Portugal, Spain, Morocco, Iran
Unlike the other groups participating in the tournament, our sample group had two teams that were fairly similar in ranking and overall stats. This was one of the more difficult groups we had to select a winner for. The two teams, Portugal and Spain, both have strong players and very similar Goals Against Average and Goals For Average.
For this global tournament, teams from around the world play against those from other regions. Not all the teams are located within the same geographic location or within the same confederation. Therefore, some teams may have their success depressed simply by the football confederation they belong to. Good teams in strong confederations can appear less impressive than bad teams in weak confederations.
Global ranking is something that we took into consideration when determining the winner for each group. The Global Ranking System allowed us to develop a good baseline understanding of how well each team did in comparison to the rest of the world. Therefore, the Global Ranking was something we had looked at using SAP Analytics Cloud before diving deeper into the data set and other measures. We should note that global ranking is heavily influenced by major competitions and less so by smaller, local tournaments.
When determining goal differential, we compared the Goals For Average (GFA) for the team that we picked to win in the group against the Goals Against Average (GAA) of the other teams within the same group. This not only gives us a potential goal differential, but it also further supports our winner picks when we see clear scoring advantages. By determining the goal differential, it allowed us to dive deeper into a team’s ability to prevent getting scored on while also scoring on their opponents. In the end, we picked our final group winner to be Spain with a goal differential of +7.
Join the Fun
Sport is known to be unpredictable, and that’s what makes it exciting!
- Try predicting the knockout round outcomes yourself using the the SAP Analytics Cloud Free Trial , and see how close you are to the final match results.
- Read Jason Yeung’s blog to see what he discovered: “Analyzing, but Not Predicting, this Year’s Soccer Championship”
- Tweet us using the hashtags #PredictTheKick and #DataGenius so we can follow along.