Can Python Predict the 2018 World Cup Champion? A Data‑Driven Analysis
Using a Kaggle dataset of roughly 40,000 matches from 1872 to 2018, this Python‑based analysis cleans the data, computes win counts and total goals for every nation, visualizes the results, and predicts Germany, Argentina and Brazil as the top three contenders for the 2018 World Cup, with Germany as the strongest favorite.
Introduction
Before the 2018 FIFA World Cup kicked off, we use Python to analyse the historical performance of participating teams and boldly forecast the tournament's favourite champions.
Data Source
The data is obtained from Kaggle and contains every World Cup match, qualification, Asian Cup, European Championship and international friendly from 1872 up to the present – about 40,000 games.
Environment
Windows 7
Python 3.6
Jupyter Notebook
pandas 0.22.0
Loading the Data
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
df = pd.read_csv('results.csv')
df.head()Dataset Columns
date
home_team
away_team
home_score (excluding penalties)
away_score (excluding penalties)
tournament
city
country
neutral
Filtering World Cup Matches
df_FIFA_all = df[df['tournament'].str.contains('FIFA', regex=True)]
df_FIFA = df_FIFA_all[df_FIFA_all['tournament'] == 'FIFA World Cup']Data Preparation
df_FIFA.loc[:, 'date'] = pd.to_datetime(df_FIFA['date'])
df_FIFA['year'] = df_FIFA['date'].dt.year
df_FIFA['diff_score'] = df_FIFA['home_score'] - df_FIFA['away_score']
df_FIFA['win_team'] = ''Determining Winners (Method 1)
# Positive score → home team wins
df_FIFA.loc[df_FIFA['diff_score'] > 0, 'win_team'] = df_FIFA['home_team']
# Negative score → away team wins
df_FIFA.loc[df_FIFA['diff_score'] < 0, 'win_team'] = df_FIFA['away_team']
# Zero score → draw
df_FIFA.loc[df_FIFA['diff_score'] == 0, 'win_team'] = 'Draw'Determining Winners (Method 2)
def find_win_team(df):
winners = []
for i, row in df.iterrows():
if row['home_score'] > row['away_score']:
winners.append(row['home_team'])
elif row['home_score'] < row['away_score']:
winners.append(row['away_team'])
else:
winners.append('Draw')
return winners
df_FIFA['winner'] = find_win_team(df_FIFA)Analysis 1 – Top 20 Teams by Win Count
s = df_FIFA.groupby('win_team')['win_team'].count()
s.sort_values(ascending=False, inplace=True)
s.drop(labels=['Draw'], inplace=True)Visualization (bar chart):
Horizontal bar chart:
Pie chart of win percentages:
Key Findings
Conclusion 1: By win count, Brazil, Germany, Italy and Argentina are the strongest historically.
Analysis 2 – Total Goals per Team
# Combine home and away scores
df_score_home = df_FIFA[['home_team', 'home_score']].rename(columns={'home_team':'team','home_score':'score'})
df_score_away = df_FIFA[['away_team', 'away_score']].rename(columns={'away_team':'team','away_score':'score'})
df_score = pd.concat([df_score_home, df_score_away], ignore_index=True)
s_score = df_score.groupby('team')['score'].sum()
s_score.sort_values(ascending=False, inplace=True)Horizontal bar chart of the top 20 goal‑scoring nations:
Conclusion 2: By total goals, Germany, Brazil, Argentina and Italy lead.
2018 World Cup – 32‑Team Analysis
The 32 qualified teams are grouped as follows:
Group 1: Russia, Germany, Brazil, Portugal, Argentina, Belgium, Poland, France
Group 2: Spain, Peru, Switzerland, England, Colombia, Mexico, Uruguay, Croatia
Group 3: Denmark, Iceland, Costa Rica, Sweden, Tunisia, Egypt, Senegal, Iran
Group 4: Serbia, Nigeria, Australia, Japan, Morocco, Panama, Korea Republic, Saudi Arabia
First‑time Participants
team_list = ['Russia','Germany','Brazil','Portugal','Argentina','Belgium','Poland','France','Spain','Peru','Switzerland','England','Colombia','Mexico','Uruguay','Croatia','Denmark','Iceland','Costa Rica','Sweden','Tunisia','Egypt','Senegal','Iran','Serbia','Nigeria','Australia','Japan','Morocco','Panama','Korea Republic','Saudi Arabia']
for item in team_list:
if item not in s_score.index:
print(item)
# Output: Iceland, PanamaThus Iceland and Panama are debutants; their historical data are absent from the long‑term analysis.
Top 32 Teams Since 1872 – Wins
s_32 = df_top32.groupby('win_team')['win_team'].count()
s_32.sort_values(ascending=False, inplace=True)
s_32.drop(labels=['Draw'], inplace=True)Top 32 Teams Since 1872 – Goals
# Same procedure as in Analysis 2 but limited to the 32‑team subsetConclusion 3: Across the entire history, Germany, Brazil and Argentina dominate both win counts and goal totals.
Since 1978 – Wins & Goals
Conclusion 4: From 1978 onward, Argentina, Germany and Brazil are the strongest by wins; the same three lead in goals, with Germany showing a clearer edge.
Since 2002 – Wins & Goals
Conclusion 5: Since 2002, Germany, Argentina and Brazil remain the top three by both wins and goals, with Germany holding the strongest statistical advantage.
Overall Prediction for 2018
Based on historical performance, the model predicts the top three contenders to be Germany, Argentina and Brazil , with Germany being the most likely champion.
Note: This analysis is for personal learning purposes only; actual tournament outcomes may differ.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
