Can Python Predict the 2018 World Cup Champion? A Data‑Driven Exploration
This article walks through a Python‑based data analysis of World Cup matches from 1872 to 2018, using pandas and Jupyter Notebook to clean the data, compute win counts and total goals, visualize the top teams, and finally predict that Germany, Argentina and Brazil are the strongest contenders for the 2018 title.
Background and Goal
Before the 2018 FIFA World Cup began, the author used Python to analyse historical match data and to forecast the tournament's most likely winners.
Data Source and Environment
The dataset was downloaded from Kaggle and contains roughly 40,000 matches spanning from 1872 to 2018, including World Cup finals, qualifiers, Asian Cup, European Cup and international friendlies. The analysis was performed on Windows 7 with Python 3.6, Jupyter Notebook and pandas 0.22.0.
Loading the Data
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
df = pd.read_csv('results.csv')
df.head()Dataset Columns
Date
Home team name
Away team name
Home team goals (excluding penalties)
Away team goals (excluding penalties)
Match type
City
Country
Neutral venue flag
Pre‑processing
# Convert date column to datetime
df_FIFA.loc[:, 'date'] = pd.to_datetime(df_FIFA.loc[:, 'date'])
# Extract year
df_FIFA['year'] = df_FIFA['date'].dt.year
# Goal difference
df_FIFA['diff_score'] = df_FIFA['home_score'] - df_FIFA['away_score']
# Initialise winner column
df_FIFA['win_team'] = ''
# Ensure numeric type for diff_score
df_FIFA['diff_score'] = pd.to_numeric(df_FIFA['diff_score'])Determine Winning Team
# Method 1: vectorised assignment
df_FIFA.loc[df_FIFA['diff_score'] > 0, 'win_team'] = df_FIFA['home_team']
df_FIFA.loc[df_FIFA['diff_score'] < 0, 'win_team'] = df_FIFA['away_team']
df_FIFA.loc[df_FIFA['diff_score'] == 0, 'win_team'] = 'Draw'
# Method 2: function with iteration
def find_win_team(df):
winners = []
for _, row in df.iterrows():
if row['home_score'] > row['away_score']:
winners.append(row['home_team'])
elif row['home_score'] < row['away_score']:
winners.append(row['away_team'])
else:
winners.append('Draw')
return winners
df_FIFA['winner'] = find_win_team(df_FIFA)Overall Win‑Count Analysis
# Count wins per team (excluding draws)
s = df_FIFA.groupby('win_team')['win_team'].count()
s.drop(labels=['Draw'], inplace=True)
s.sort_values(ascending=False, inplace=True)The resulting bar chart (Top 20 Winners of World Cup) shows Brazil, Germany, Italy and Argentina as the teams with the most victories.
Total Goal Analysis
# Combine home and away scores into a single table
df_score_home = df_FIFA[['home_team', 'home_score']].rename(columns={'home_team':'team','home_score':'score'})
df_score_away = df_FIFA[['away_team', 'away_score']].rename(columns={'away_team':'team','away_score':'score'})
df_score = pd.concat([df_score_home, df_score_away], ignore_index=True)
s_score = df_score.groupby('team')['score'].sum()
s_score.sort_values(ascending=False, inplace=True)The bar chart (Top 20 in Total Scores of World Cup) indicates Germany, Brazil, Argentina and Italy as the highest‑scoring nations.
2018 World Cup – 32‑Team Focus
The 32 qualified teams were grouped into four pools. The analysis first identified debutants (Iceland and Panama) by checking which teams from the 32‑team list were absent from the historic win‑count index.
team_list = ['Russia','Germany','Brazil','Portugal','Argentina','Belgium','Poland','France',
'Spain','Peru','Switzerland','England','Colombia','Mexico','Uruguay','Croatia',
'Denmark','Iceland','Costa Rica','Sweden','Tunisia','Egypt','Senegal','Iran',
'Serbia','Nigeria','Australia','Japan','Morocco','Panama','Korea Republic','Saudi Arabia']
for item in team_list:
if item not in s_score.index:
print(item)
# Output: Iceland, PanamaSince Iceland and Panama had no prior World Cup data, they were excluded from the subsequent 32‑team statistics.
# Filter matches that involve only the 32 qualified teams
df_top32 = df_FIFA[(df_FIFA['home_team'].isin(team_list)) & (df_FIFA['away_team'].isin(team_list))]Win‑Count Within the 32 Teams (All Years)
s_32 = df_top32.groupby('win_team')['win_team'].count()
s_32.drop(labels=['Draw'], inplace=True)
s_32.sort_values(ascending=False, inplace=True)Total Goals Within the 32 Teams (All Years)
# Re‑use the home/away concatenation logic on the filtered dataset
df_score_home_32 = df_top32[['home_team','home_score']].rename(columns={'home_team':'team','home_score':'score'})
df_score_away_32 = df_top32[['away_team','away_score']].rename(columns={'away_team':'team','away_score':'score'})
df_score_32 = pd.concat([df_score_home_32, df_score_away_32], ignore_index=True)
s_score_32 = df_score_32.groupby('team')['score'].sum()
s_score_32.sort_values(ascending=False, inplace=True)Key Findings
Across all historical matches, Brazil, Germany, Italy and Argentina have the highest win counts.
In terms of total goals scored, Germany, Brazil, Argentina and Italy lead.
For the 2018 tournament’s 32 teams, Germany, Argentina and Brazil emerge as the strongest based on both win‑count and goal totals.
Germany appears as the most likely champion according to the data‑driven forecast.
Conclusion
Using publicly available match data and straightforward pandas operations, the analysis suggests that Germany, Argentina and Brazil are the top three contenders for the 2018 World Cup, with Germany being the strongest favorite. The results are for personal learning purposes only and should not be taken as definitive predictions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
