Fundamentals 11 min read

Can Python Predict the 2018 World Cup Champion? A Data‑Driven Analysis

Using a Kaggle dataset of roughly 40,000 matches from 1872 to 2018, this Python‑based analysis cleans the data, computes win counts and total goals for every nation, visualizes the results, and predicts Germany, Argentina and Brazil as the top three contenders for the 2018 World Cup, with Germany as the strongest favorite.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Can Python Predict the 2018 World Cup Champion? A Data‑Driven Analysis

Introduction

Before the 2018 FIFA World Cup kicked off, we use Python to analyse the historical performance of participating teams and boldly forecast the tournament's favourite champions.

Data Source

The data is obtained from Kaggle and contains every World Cup match, qualification, Asian Cup, European Championship and international friendly from 1872 up to the present – about 40,000 games.

Environment

Windows 7

Python 3.6

Jupyter Notebook

pandas 0.22.0

Loading the Data

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

df = pd.read_csv('results.csv')
df.head()

Dataset Columns

date

home_team

away_team

home_score (excluding penalties)

away_score (excluding penalties)

tournament

city

country

neutral

Filtering World Cup Matches

df_FIFA_all = df[df['tournament'].str.contains('FIFA', regex=True)]
df_FIFA = df_FIFA_all[df_FIFA_all['tournament'] == 'FIFA World Cup']

Data Preparation

df_FIFA.loc[:, 'date'] = pd.to_datetime(df_FIFA['date'])
df_FIFA['year'] = df_FIFA['date'].dt.year

df_FIFA['diff_score'] = df_FIFA['home_score'] - df_FIFA['away_score']
df_FIFA['win_team'] = ''

Determining Winners (Method 1)

# Positive score → home team wins
df_FIFA.loc[df_FIFA['diff_score'] > 0, 'win_team'] = df_FIFA['home_team']
# Negative score → away team wins
df_FIFA.loc[df_FIFA['diff_score'] < 0, 'win_team'] = df_FIFA['away_team']
# Zero score → draw
df_FIFA.loc[df_FIFA['diff_score'] == 0, 'win_team'] = 'Draw'

Determining Winners (Method 2)

def find_win_team(df):
    winners = []
    for i, row in df.iterrows():
        if row['home_score'] > row['away_score']:
            winners.append(row['home_team'])
        elif row['home_score'] < row['away_score']:
            winners.append(row['away_team'])
        else:
            winners.append('Draw')
    return winners

df_FIFA['winner'] = find_win_team(df_FIFA)

Analysis 1 – Top 20 Teams by Win Count

s = df_FIFA.groupby('win_team')['win_team'].count()
s.sort_values(ascending=False, inplace=True)
s.drop(labels=['Draw'], inplace=True)

Visualization (bar chart):

Horizontal bar chart:

Pie chart of win percentages:

Key Findings

Conclusion 1: By win count, Brazil, Germany, Italy and Argentina are the strongest historically.

Analysis 2 – Total Goals per Team

# Combine home and away scores
df_score_home = df_FIFA[['home_team', 'home_score']].rename(columns={'home_team':'team','home_score':'score'})
df_score_away = df_FIFA[['away_team', 'away_score']].rename(columns={'away_team':'team','away_score':'score'})

df_score = pd.concat([df_score_home, df_score_away], ignore_index=True)
s_score = df_score.groupby('team')['score'].sum()
s_score.sort_values(ascending=False, inplace=True)

Horizontal bar chart of the top 20 goal‑scoring nations:

Conclusion 2: By total goals, Germany, Brazil, Argentina and Italy lead.

2018 World Cup – 32‑Team Analysis

The 32 qualified teams are grouped as follows:

Group 1: Russia, Germany, Brazil, Portugal, Argentina, Belgium, Poland, France

Group 2: Spain, Peru, Switzerland, England, Colombia, Mexico, Uruguay, Croatia

Group 3: Denmark, Iceland, Costa Rica, Sweden, Tunisia, Egypt, Senegal, Iran

Group 4: Serbia, Nigeria, Australia, Japan, Morocco, Panama, Korea Republic, Saudi Arabia

First‑time Participants

team_list = ['Russia','Germany','Brazil','Portugal','Argentina','Belgium','Poland','France','Spain','Peru','Switzerland','England','Colombia','Mexico','Uruguay','Croatia','Denmark','Iceland','Costa Rica','Sweden','Tunisia','Egypt','Senegal','Iran','Serbia','Nigeria','Australia','Japan','Morocco','Panama','Korea Republic','Saudi Arabia']

for item in team_list:
    if item not in s_score.index:
        print(item)
# Output: Iceland, Panama

Thus Iceland and Panama are debutants; their historical data are absent from the long‑term analysis.

Top 32 Teams Since 1872 – Wins

s_32 = df_top32.groupby('win_team')['win_team'].count()
s_32.sort_values(ascending=False, inplace=True)
s_32.drop(labels=['Draw'], inplace=True)

Top 32 Teams Since 1872 – Goals

# Same procedure as in Analysis 2 but limited to the 32‑team subset

Conclusion 3: Across the entire history, Germany, Brazil and Argentina dominate both win counts and goal totals.

Since 1978 – Wins & Goals

Conclusion 4: From 1978 onward, Argentina, Germany and Brazil are the strongest by wins; the same three lead in goals, with Germany showing a clearer edge.

Since 2002 – Wins & Goals

Conclusion 5: Since 2002, Germany, Argentina and Brazil remain the top three by both wins and goals, with Germany holding the strongest statistical advantage.

Overall Prediction for 2018

Based on historical performance, the model predicts the top three contenders to be Germany, Argentina and Brazil , with Germany being the most likely champion.

Note: This analysis is for personal learning purposes only; actual tournament outcomes may differ.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

World Cup
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.