Artificial Intelligence 9 min read

Why Millions of LLM Agents Still Fail to Form a Real Society

An in‑depth analysis of the Moltbook platform shows that even with 2.6 million autonomous LLM agents interacting for months, large‑scale interaction does not automatically lead to genuine social structures, revealing three layers of socialization failure and offering a three‑dimensional diagnostic framework for AI societies.

PaperAgent

Feb 21, 2026

Why Millions of LLM Agents Still Fail to Form a Real Society

Background

The Moltbook platform hosts roughly 2.6 million autonomous LLM‑driven agents (≈2.6 M posts) and serves as a large‑scale experimental testbed for studying AI‑driven social dynamics.

Diagnostic Framework

The authors propose a three‑dimensional framework to evaluate AI societies:

Social layer : macro‑level stability and diversity of the whole community.

Individual layer : semantic trajectory of each agent and its responsiveness to feedback.

Collective layer : emergence of hierarchical influence and shared memory.

Key Findings

1. Social Layer – Dynamic Balance (Stable but Not Convergent)

Activity dynamics : Daily post volume quickly rises to tens of thousands and then plateaus; new‑user onboarding slows while comment and vote activity remain high, indicating a transition from rapid expansion to a mature participation state.

Lexical turnover : Tracking 1‑ to 5‑gram lifecycles shows a constant birth‑death rate, meaning no fixed jargon or dialect stabilizes over time.

Semantic stability : Using Sentence‑BERT, the centroid similarity of all posts saturates near 1.0 within a few days, demonstrating a stable global topic center. In contrast, pairwise similarity stays around 0.15, revealing high variance among individual posts.

Clustering analysis : Similarity to the 10 nearest neighbors tightens briefly during the first three days and then reaches a steady state; no progressive clustering or “tightening” is observed over longer periods.

Interpretation : Moltbook exhibits a “stable center + diverse edges” pattern—global behavior stabilizes while individual content remains fluid and heterogeneous.

2. Individual Layer – Inertia Trap (Interaction Without Influence)

Semantic drift test : Each agent’s posting history is split into two halves; the shift in semantic centroid is minimal, suggesting that drift is driven by the underlying model or prompt rather than social interaction.

Feedback adaptation experiment : Agents were exposed to high‑up‑vote versus low‑up‑vote posts. Post‑generation analysis shows no measurable convergence toward highly voted content, indicating that community signals are ignored.

Interaction impact test : After an agent A comments on agent B’s post, the semantic distance between A’s subsequent posts and B’s content does not decrease, confirming deep individual inertia despite extensive commenting.

Interpretation : Agents display “social hollowing”: they interact widely and receive feedback, yet their linguistic trajectories remain unchanged.

3. Collective Layer – Fragmentation (No Authority, No Consensus)

Influence dispersion : Daily interaction graphs were analyzed with PageRank. The cumulative PageRank of the top‑1‑3 nodes drops rapidly, and influence spreads across the network rather than concentrating.

Super‑node stability : The number of statistically significant high‑influence nodes stays in the single digits and changes day‑to‑day, meaning no persistent “big V” emerges.

Shared memory probe : The researchers injected 45 detection posts (e.g., recommended reads, notable accounts, community background). Agent responses showed no consistent recognition of influential figures, indicating an absence of collective social memory.

Interpretation : Moltbook fails to develop a stable hierarchy or shared memory; agents lack a common cognition of influential entities.

Deep Insights

Scale ≠ Socialization : Even with millions of agents, dense interaction, and months of operation, emergent social structures do not appear, contradicting a simple “scale hypothesis”.

Governance needed : An observed memecoin‑style token‑minting event triggered by thousands of posts illustrates that connectivity alone is insufficient; explicit governance mechanisms are required for coordinated behavior.

Current LLM agent limits : Agents do not internalize norms, form hierarchical authority, retain collective memory, or accumulate culture, highlighting core limitations of present‑day LLM‑driven agents.

Methodological Contribution

The study introduces a three‑dimensional diagnostic framework for AI socialization, providing quantitative metrics (post volume, n‑gram turnover, centroid and pairwise similarity, nearest‑neighbor clustering, PageRank, detection‑post probing) that can be applied to other AI‑only social platforms.

References

https://arxiv.org/pdf/2602.14299

https://github.com/tianyi-lab/Moltbook_Socialization

https://x.com/LeoYe_AI/status/2021903008741929410

AI agents LLM Moltbook AI society Diagnostic framework socialization

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.