Do Complex Multi‑Agent Mechanisms Really Boost Investment Returns? A CMU Validation
A five‑agent GPT‑4o‑mini trading system was evaluated over 21 months across technology, general, and financial markets, revealing that while communication among agents can boost returns, the optimal dialogue style depends on market volatility, and higher dialogue quality does not guarantee better performance.
Background
Multi‑strategy hedge funds face an organization‑choice problem: whether analysts (agents) should communicate and how. Prior LLM‑based trading work focused on single‑agent setups or assumed independent agents, ignoring capital competition.
Problem Definition
Identify how different communication structures affect collective alpha generation under various market characteristics.
Investigate whether communication leads to strategy diversity loss and its impact on performance.
Analyze how competitive vs. collaborative dialogues change agent behavior and decisions.
Examine the relationship between dialogue quality and performance.
Method
Experiment Setup
Five GPT‑4o‑mini agents trade from Jan 2024 to Sep 2025 (21 months). Each configuration runs 30 independent iterations, totaling 450 experiments. Markets: technology, general, and financial sectors, each with ten stocks (e.g., NVDA, MSFT, GOOGL for tech).
Organization Structures
Baseline (no communication): Equal capital, monthly capital reallocation based on returns, no information sharing.
Leaderboard: Agents see monthly performance ranking but cannot exchange strategies.
Collaborative dialogue: Two monthly cooperative discussion rounds; no ranking visibility; emphasis on collective improvement.
Leaderboard + collaborative dialogue: Combines ranking with cooperative discussion.
Competitive dialogue: Two monthly strategic discussion rounds; agents see rankings and can view top‑3 alpha expressions; prompts stress differentiation and ranking improvement.
Alpha Expression Construction
Each agent can use >50 mathematical operations (cross‑sectional, time‑series, technical indicators) to build alpha expressions, updating monthly based on performance. In dialogue‑enabled configurations, agents receive cross‑month discussion points to guide construction.
Performance Evaluation
Performance measured by total return and Sharpe ratio. Strategy diversity quantified by average pairwise correlation of daily allocations between the first and last month. Dialogue quality scored with the CORE metric.
Results
Communication Effectiveness Depends on Market
In tech and general markets, communication improves returns; in the financial market the effect is weak. Competitive dialogue yields the highest return increase (22.5 %) in volatile tech stocks; collaborative dialogue is best in stable general stocks (23.9 %); financial stocks improve at most 7.7 %.
Strategy Convergence but Performance Divergence
All structures converge to similar strategy correlation (0.74–0.90), indicating market structure—not information sharing—drives convergence. Despite similar correlations, performance gaps remain: competitive agents focus on stock‑level allocation, while collaborative agents build consensus frameworks, affecting robustness.
Dialogue Quality vs. Performance
CORE score shows virtually no correlation with return (r = 0.04, p = 0.91). Example: financial‑collaborative configuration has the highest CORE (0.301) but only moderate return (72.7 %); tech‑competitive + leaderboard has the lowest CORE yet strong return (114.9 %). CORE variations are negatively correlated with return improvements (r = ‑0.54).
Additional Analyses
ANOVA on final correlation shows no significant differences across structures, confirming market‑driven convergence.
Content analysis reveals collaborative dialogues discuss abstract methodological improvements, while competitive dialogues focus on tactical stock allocation and ranking.
Paper: https://arxiv.org/pdf/2511.13614
Code: https://github.com/Jerick-1380/multi-agent-alpha-generation
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
