LLM Showdown in a Three‑Kingdoms Strategy Game: Tactics, Winners, and Surprising Insights
This article details a custom Three‑Kingdoms‑style strategy game used to benchmark nine flagship large language models, explains the game mechanics, evaluates each model's strategic decisions and diplomatic behavior, and reveals how Gemini 3.1 Pro clinched the championship with a clever "坚壁清野" tactic while also sharing the underlying engine architecture and development lessons.
The author built a simplified, turn‑based Three‑Kingdoms strategy game to let large language models (LLMs) act as faction leaders, aiming to compare their planning, reasoning, and execution abilities in a competitive setting.
Game Mechanics
Each faction starts with five cities; action points per turn equal the number of owned cities. Resources (money, food, population) are shared, and morale (民心) multiplies production. Players issue commands via a messenger system, allowing limited diplomatic messages that can influence opponents. The game ends after 24 turns or when a city‑count tie‑break occurs.
LLM Participants and Grouping
Nine flagship LLMs were divided into three groups:
Group A (Domestic LLMs) : MiniMax‑M2.7, Kimi‑K2.5, MiMo‑V2‑Pro
Group B (International LLMs) : Claude‑Opus‑4.6, GLM‑5, GPT‑5.4
Group C (Hybrid) : Gemini‑3.1‑Pro, Qwen‑3.5‑397B, Google‑Gemini‑3.1‑Pro
Performance Evaluation
Kimi‑K2.5 (A) earned an A rating for steady economic growth, timely attacks on Xiangyang, and strong diplomatic pressure, though it was slow to mobilize troops.
MiMo‑V2‑Pro (A) received a B+ for an aggressive early expansion and effective “prisoner diplomacy,” but its late‑game resource collapse hurt its standing.
MiniMax‑M2.7 (A) scored a C; its strategic direction was sound but execution faltered, leading to a rapid collapse.
Claude‑Opus‑4.6 (B) achieved an A for deep strategic analysis and a near‑perfect alliance with Kimi, yet repeated grain shortages undermined its logistics.
GLM‑5 (B) earned a C, suffering from passive defense and a disastrous decision to attack Luoyang early.
GPT‑5.4 (B) received a C; it showed good analysis but constantly changed its main attack direction, resulting in indecisive play.
Gemini‑3.1‑Pro (C) dominated with an S‑level rating, employing a “坚壁清野” (fortify‑and‑abandon) tactic: when a city was about to fall, it withdrew troops and resources, leaving an empty city that the enemy could not capture effectively. This allowed Gemini to preserve all generals, execute precise counter‑attacks, and ultimately win the final after a dramatic comeback.
Qwen‑3.5‑397B (C) performed poorly (D) due to early betrayal of an alliance, repeated failed assaults, and massive talent loss.
Final Tournament Highlights
The decisive match featured a stable alliance between Claude and Kimi, which kept Gemini under pressure. Gemini’s “坚壁清野” tactic, combined with flawless talent management and timely counter‑attacks, turned a 3‑city disadvantage into a 5‑city victory, earning the MVP title.
Engine Architecture
The game engine follows a three‑layer design: Player (abstracts human or LLM input), Engine (core game state, battle scheduler, army movement, random seed), and Renderer (GUI or replay). The engine is decoupled from both the player and rendering libraries, enabling headless CLI operation and easy testing.
execute_commands → process_armies → process_battles → process_end_of_turn → check_victoryKey modules include battle_resolver.py (combat sequencing), battle_scheduler.py (encounter pairing), army_movement.py (logistics), and battle_context.py (shared dataclass).
Development Insights
Tools such as Claude‑Code, VS‑Code‑Buddy, Cursor, and Superpowers were used for rapid iteration. Over 600 automated test cases ensured stability, and balance testing involved running the same LLM in all three factions to observe win rates.
Two notable challenges emerged: (1) historical bias caused the models to default to classic Three‑Kingdoms alliances, which required explicit prompts to avoid; (2) moral constraints made some models avoid combat, necessitating clear system prompts that the game is a simulation.
Conclusion
The competition demonstrated that LLMs can exhibit sophisticated strategic reasoning, diplomatic manipulation, and adaptive tactics when placed in a rule‑based game environment. Gemini’s victory highlighted the importance of flexible defense and resource preservation, while Claude’s deep analysis showed the potential of LLMs for high‑level planning despite logistical shortcomings.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
