Fundamentals 11 min read

Can Python Predict the 2018 World Cup Champion? A Data‑Driven Exploration

This article walks through a Python‑based data analysis of World Cup matches from 1872 to 2018, using pandas and Jupyter Notebook to clean the data, compute win counts and total goals, visualize the top teams, and finally predict that Germany, Argentina and Brazil are the strongest contenders for the 2018 title.

ITPUB
ITPUB
ITPUB
Can Python Predict the 2018 World Cup Champion? A Data‑Driven Exploration

Background and Goal

Before the 2018 FIFA World Cup began, the author used Python to analyse historical match data and to forecast the tournament's most likely winners.

Data Source and Environment

The dataset was downloaded from Kaggle and contains roughly 40,000 matches spanning from 1872 to 2018, including World Cup finals, qualifiers, Asian Cup, European Cup and international friendlies. The analysis was performed on Windows 7 with Python 3.6, Jupyter Notebook and pandas 0.22.0.

Loading the Data

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

df = pd.read_csv('results.csv')
df.head()

Dataset Columns

Date

Home team name

Away team name

Home team goals (excluding penalties)

Away team goals (excluding penalties)

Match type

City

Country

Neutral venue flag

Pre‑processing

# Convert date column to datetime
df_FIFA.loc[:, 'date'] = pd.to_datetime(df_FIFA.loc[:, 'date'])
# Extract year
df_FIFA['year'] = df_FIFA['date'].dt.year
# Goal difference
df_FIFA['diff_score'] = df_FIFA['home_score'] - df_FIFA['away_score']
# Initialise winner column
df_FIFA['win_team'] = ''
# Ensure numeric type for diff_score
df_FIFA['diff_score'] = pd.to_numeric(df_FIFA['diff_score'])

Determine Winning Team

# Method 1: vectorised assignment
df_FIFA.loc[df_FIFA['diff_score'] > 0, 'win_team'] = df_FIFA['home_team']
df_FIFA.loc[df_FIFA['diff_score'] < 0, 'win_team'] = df_FIFA['away_team']
df_FIFA.loc[df_FIFA['diff_score'] == 0, 'win_team'] = 'Draw'

# Method 2: function with iteration
def find_win_team(df):
    winners = []
    for _, row in df.iterrows():
        if row['home_score'] > row['away_score']:
            winners.append(row['home_team'])
        elif row['home_score'] < row['away_score']:
            winners.append(row['away_team'])
        else:
            winners.append('Draw')
    return winners

df_FIFA['winner'] = find_win_team(df_FIFA)

Overall Win‑Count Analysis

# Count wins per team (excluding draws)
s = df_FIFA.groupby('win_team')['win_team'].count()
s.drop(labels=['Draw'], inplace=True)
s.sort_values(ascending=False, inplace=True)

The resulting bar chart (Top 20 Winners of World Cup) shows Brazil, Germany, Italy and Argentina as the teams with the most victories.

Top 20 Winners Bar Chart
Top 20 Winners Bar Chart

Total Goal Analysis

# Combine home and away scores into a single table
df_score_home = df_FIFA[['home_team', 'home_score']].rename(columns={'home_team':'team','home_score':'score'})
df_score_away = df_FIFA[['away_team', 'away_score']].rename(columns={'away_team':'team','away_score':'score'})

df_score = pd.concat([df_score_home, df_score_away], ignore_index=True)

s_score = df_score.groupby('team')['score'].sum()
s_score.sort_values(ascending=False, inplace=True)

The bar chart (Top 20 in Total Scores of World Cup) indicates Germany, Brazil, Argentina and Italy as the highest‑scoring nations.

Top 20 Total Goals Bar Chart
Top 20 Total Goals Bar Chart

2018 World Cup – 32‑Team Focus

The 32 qualified teams were grouped into four pools. The analysis first identified debutants (Iceland and Panama) by checking which teams from the 32‑team list were absent from the historic win‑count index.

team_list = ['Russia','Germany','Brazil','Portugal','Argentina','Belgium','Poland','France',
             'Spain','Peru','Switzerland','England','Colombia','Mexico','Uruguay','Croatia',
             'Denmark','Iceland','Costa Rica','Sweden','Tunisia','Egypt','Senegal','Iran',
             'Serbia','Nigeria','Australia','Japan','Morocco','Panama','Korea Republic','Saudi Arabia']

for item in team_list:
    if item not in s_score.index:
        print(item)
# Output: Iceland, Panama

Since Iceland and Panama had no prior World Cup data, they were excluded from the subsequent 32‑team statistics.

# Filter matches that involve only the 32 qualified teams
df_top32 = df_FIFA[(df_FIFA['home_team'].isin(team_list)) & (df_FIFA['away_team'].isin(team_list))]

Win‑Count Within the 32 Teams (All Years)

s_32 = df_top32.groupby('win_team')['win_team'].count()
s_32.drop(labels=['Draw'], inplace=True)
s_32.sort_values(ascending=False, inplace=True)
Top 32 Win Count Barh Chart
Top 32 Win Count Barh Chart

Total Goals Within the 32 Teams (All Years)

# Re‑use the home/away concatenation logic on the filtered dataset
df_score_home_32 = df_top32[['home_team','home_score']].rename(columns={'home_team':'team','home_score':'score'})
df_score_away_32 = df_top32[['away_team','away_score']].rename(columns={'away_team':'team','away_score':'score'})

df_score_32 = pd.concat([df_score_home_32, df_score_away_32], ignore_index=True)

s_score_32 = df_score_32.groupby('team')['score'].sum()
s_score_32.sort_values(ascending=False, inplace=True)
Top 32 Total Goals Barh Chart
Top 32 Total Goals Barh Chart

Key Findings

Across all historical matches, Brazil, Germany, Italy and Argentina have the highest win counts.

In terms of total goals scored, Germany, Brazil, Argentina and Italy lead.

For the 2018 tournament’s 32 teams, Germany, Argentina and Brazil emerge as the strongest based on both win‑count and goal totals.

Germany appears as the most likely champion according to the data‑driven forecast.

Conclusion

Using publicly available match data and straightforward pandas operations, the analysis suggests that Germany, Argentina and Brazil are the top three contenders for the 2018 World Cup, with Germany being the strongest favorite. The results are for personal learning purposes only and should not be taken as definitive predictions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

pandasdata-analysisJupyter Notebookworld-cup
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.