Can the Massey Method Predict the World Cup Winner? A Data‑Driven Ranking Study
This article explains the Massey ranking method, shows how to build the required matrices and vectors from World Cup match data, implements the model in Python, and compares three scoring strategies to forecast whether Argentina or France will win the tournament.
Massey’s Method in Mathematics
The Massey method, originally proposed by K. Massey in 1997 for ranking American college football teams, uses a least‑squares solution of a linear system to assign strength scores to teams based on match outcomes and cumulative score differences.
Predicting the World Cup with the Massey Method
To forecast the 2022 World Cup final, the author applies the Massey method to the two finalist teams, Argentina and France. The core idea is that the difference between the teams' scores predicts the advantage of the winning side in a match.
The model constructs a sparse "Massey matrix" where each row corresponds to a match: a 1 appears in the column of the winning team, a –1 in the column of the losing team, and the diagonal entries equal the number of matches each team has played. The right‑hand vector contains the cumulative score differences (or other defined advantages) for each team.
Because the system is usually over‑determined and singular, a constraint is added by replacing one row with all‑ones and setting the corresponding element of the right‑hand vector to zero, forcing the sum of scores to be zero and yielding a full‑rank matrix.
Data
The match data from the group stage to the semi‑finals were collected manually and stored in an Excel file. The data include the two teams in each match and their respective scores.
Code
<code># packages
import pandas as pd
import numpy as np
data = pd.read_excel('data/2022worldCup.xlsx')
# all teams
teams1 = set(data['队伍1'].unique())
teams2 = set(data['队伍2'].unique())
teams = teams1 | teams2
team_list = list(teams)
# compute game number
games_array = data[['队伍1','队伍2']].values
games_list = [{games_array[n,0],games_array[n,1]} for n in range(len(games_array))]
# construct dataframe of competition
df_comp = pd.DataFrame(data=np.zeros((len(team_list),len(team_list))), index=team_list, columns=team_list)
df_comp1 = df_comp.copy()
# fill numbers
for t1 in df_comp.index:
for t2 in df_comp.columns:
for i in data.index:
t1t2 = data.loc[i,['队伍1','队伍2']].tolist()
if [t1,t2] == t1t2 or [t2,t1] == t1t2:
df_comp1.loc[t1,t2] -= 1
# Fill in the diagonal values
df_comp2 = df_comp1.copy()
for t in df_comp.index:
df_comp2.loc[t,t] = - (df_comp1.loc[t,:]).sum()
total_number_game_list = [df_comp2.loc[t,t] for t in df_comp.index]
# strategy 1: score
win_dict = {t:0 for t in team_list}
for t in team_list:
for i in data.index:
if t == data.loc[i,'队伍1']:
tgoal = data.loc[i,'得分1']
ta = data.loc[i,'得分2']
if tgoal > ta:
win_dict[t] += 3
elif tgoal == ta:
win_dict[t] += 1
if t == data.loc[i,'队伍2']:
tgoal = data.loc[i,'得分2']
ta = data.loc[i,'得分1']
if tgoal > ta:
win_dict[t] += 3
elif tgoal == ta:
win_dict[t] += 1
# strategy 2: winning times
win_dict2 = {t:0 for t in team_list}
for t in team_list:
for i in data.index:
if t == data.loc[i,'队伍1']:
tgoal = data.loc[i,'得分1']
ta = data.loc[i,'得分2']
if tgoal > ta:
win_dict2[t] += 1
elif tgoal < ta:
win_dict2[t] -= 1
if t == data.loc[i,'队伍2']:
tgoal = data.loc[i,'得分2']
ta = data.loc[i,'得分1']
if tgoal > ta:
win_dict2[t] += 1
elif tgoal < ta:
win_dict2[t] -= 1
# calculate result
def get_result(plist=goal_list):
'''
plist: score list used for calculating cumulated advantage
'''
M0 = df_comp2.values
p0 = np.array(plist).reshape(-1,1)
M1 = M0.copy()
M1[-1,:] = 1
p1 = p0.copy()
p1[-1] = 0
r = np.linalg.inv(M1) @ p1
df_result = pd.DataFrame({'Team':df_comp.index,'Total_Number_Games':total_number_game_list,'Score':r.flatten()})
return df_result.sort_values(by='Score',ascending=False)
</code>Results
The computed rankings for each strategy are visualized below.
Strategy 1: Using Goal Difference as Advantage
France ranks slightly higher, but the margin is small and neither team dominates the overall ranking.
Strategy 2: Using Points (3‑1‑0) as Advantage
Again France leads marginally, yet the ranking does not show a clear superiority.
Strategy 3: Using Points (1‑0‑‑1) as Advantage
Argentina moves ahead, producing a ranking that aligns better with expectations, though the gap remains narrow.
Based on these analyses, the author predicts a narrow Argentine victory in the final, noting that the result is for entertainment purposes only.
Reference
"Who Ranks First? The Science of Evaluation and Ranking" by R. Lanville and R. Meyer, Mechanical Industry Press, 2014.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.