Can Math Build the Ultimate Pokémon Dream Team? A Data‑Driven Analysis

This article uses a Kaggle Pokémon dataset of 802 creatures to explore statistical correlations, build a random‑forest classifier for legendary status, assess type strengths, and apply optimization techniques—including integer linear programming, greedy selection, and simulated annealing—to propose an optimal six‑Pokémon dream team.

Model Perspective
Model Perspective
Model Perspective
Can Math Build the Ultimate Pokémon Dream Team? A Data‑Driven Analysis
In the world of Pokémon, every trainer dreams of building the strongest team, but with hundreds of species, how should one choose the six members? This study applies a mathematical approach using a Kaggle dataset of 802 Pokémon with attributes such as base stats, type, height, weight, and more.

The dataset contains columns like name , japanese_name , pokedex_number , percentage_male , type1 , type2 , classification , height_m , weight_kg , capture_rate , base_egg_steps , abilities , experience_growth , base_happiness , 18 type‑effectiveness columns ( against_?), the six base stats ( hp, attack, defense, sp_attack, sp_defense, speed), generation , and is_legendary .

Can we build a classifier to identify legendary Pokémon?

How do height and weight relate to base stats?

What factors affect experience growth and egg‑step count, and are they correlated?

Which Pokémon type is overall strongest or weakest?

Which type is most likely to contain legendary Pokémon?

Can we construct a six‑Pokémon "dream team" that maximizes damage while minimizing vulnerability?

import pandas as pd
# Load the dataset
pokemon_df = pd.read_csv('data/pokemon.csv')
# Display the first few rows
pokemon_df.head()

Question 1: Can we build a classifier to identify legendary Pokémon?

First, we examine the distribution of the is_legendary flag.

# Distribution of Legendary vs Non‑Legendary Pokemon
legendary_distribution = pokemon_df['is_legendary'].value_counts(normalize=True) * 100
legendary_distribution

Result: non‑legendary ≈ 91.26%, legendary ≈ 8.74%, indicating a highly imbalanced dataset.

We then compare base‑stat distributions between the two groups using KDE plots (image omitted for brevity).

Observations: Legendary Pokémon tend to have higher values for HP, Attack, Special Attack, Special Defense, and Speed.

Next, we train a simple Random Forest classifier.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score

stats = ['hp', 'attack', 'defense', 'sp_attack', 'sp_defense', 'speed']
X = pokemon_df[stats]
y = pokemon_df['is_legendary']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
print(accuracy, classification_rep)

Result: accuracy ≈ 94.19%. However, recall for the legendary class is only 38%, reflecting the imbalance.

Question 2: How do height and weight relate to base stats?

# Correlation of height and weight with base stats
correlation_with_height = pokemon_df[stats + ['height_m']].corr()['height_m'].drop('height_m')
correlation_with_weight = pokemon_df[stats + ['weight_kg']].corr()['weight_kg'].drop('weight_kg')
print(correlation_with_height, correlation_with_weight)

Height shows moderate positive correlation with HP (0.48), Attack (0.42), Defense (0.36), etc. Weight shows similar patterns, especially with HP (0.43) and Defense (0.42). Speed correlates weakly with both.

Question 3: What influences experience growth and egg‑step count? Are they related?

# Correlation between experience_growth and base_egg_steps
corr_exp_egg = pokemon_df['experience_growth'].corr(pokemon_df['base_egg_steps'])
# Correlations with other stats omitted for brevity
print(corr_exp_egg)

The two attributes have a moderate positive correlation (~0.37). Experience growth correlates positively with Attack, HP, and Special Attack, while egg steps correlate positively with all base stats, especially Special Attack.

Question 4: Which Pokémon type is strongest and which is weakest?

# Compute total base stats and average per type
pokemon_df['total_stats'] = pokemon_df[stats].sum(axis=1)
avg_type1 = pokemon_df.groupby('type1')['total_stats'].mean()
avg_type2 = pokemon_df.groupby('type2')['total_stats'].mean()
overall_avg = (avg_type1 + avg_type2).dropna() / 2
sorted_overall = overall_avg.sort_values(ascending=False)
print(sorted_overall)

Result shows Dragon type as strongest (average total ≈ 510.12) and Bug type as weakest (≈ 380.52).

Question 5: Which type is most likely to be legendary?

# Percentage of legendary Pokémon per type
legendary_pct_type1 = pokemon_df.groupby('type1')['is_legendary'].mean() * 100
legendary_pct_type2 = pokemon_df.groupby('type2')['is_legendary'].mean() * 100
overall_legendary_pct = (legendary_pct_type1 + legendary_pct_type2).dropna() / 2
sorted_legendary = overall_legendary_pct.sort_values(ascending=False)
print(sorted_legendary)

Dragon type has the highest legendary proportion (~24.73%), while Normal type has the lowest (~1.43%).

Question 6: Can we build a six‑Pokémon dream team?

We first select the six Pokémon with the highest total stats:

# Top 6 by total stats
top_6 = pokemon_df.sort_values(by='total_stats', ascending=False).head(6)
print(top_6[['name', 'type1', 'type2', 'total_stats']])

Resulting team: Mewtwo (Psychic, 780), Rayquaza (Dragon/Flying, 780), Groudon (Ground, 770), Kyogre (Water, 770), Arceus (Normal, 720), Zygarde (Dragon/Ground, 708).

To consider type interactions, we construct a damage‑coefficient matrix D from the against_? columns and formulate an integer linear programming model that maximizes total stats while accounting for pairwise damage advantages. Because solving the full ILP for 800+ Pokémon is costly, we restrict the search to the top 100 candidates and apply a greedy heuristic and simulated annealing.

# Example of greedy team selection (simplified)
def greedy_team_selection(pokemon_df, D, n=6):
    selected = [pokemon_df['total_stats'].idxmax()]
    for _ in range(n-1):
        best_idx, best_val = -1, -float('inf')
        for i in pokemon_df.index:
            if i in selected:
                continue
            value = pokemon_df.loc[i, 'total_stats']
            for j in selected:
                value += D.loc[i, j] - D.loc[j, i]
            if value > best_val:
                best_val, best_idx = value, i
        selected.append(best_idx)
    return selected

The greedy and simulated‑annealing runs produce a similar team composition to the simple top‑stat selection, confirming the robustness of the result.

In conclusion, a data‑driven approach identifies Mewtwo, Rayquaza, Groudon, Kyogre, Arceus, and Zygarde as a strong candidate dream team, though real‑world battles also depend on moves, strategy, and luck.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

optimizationmachine learningdata analysispokemonteam selection
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.