From Language Modeling to World Modeling: Limits of Large Language Models

Speaker Li Yixia from Southern University of Science and Technology presents a talk on using large language models as textual world models, defining a three‑layer evaluation framework and showing through experiments that fine‑tuned models improve next‑state prediction and agent performance, yet face limits tied to behavior coverage and environment complexity.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
From Language Modeling to World Modeling: Limits of Large Language Models

Li Yixia, a third‑year PhD student at Southern University of Science and Technology (advisor Chen Guanhua), researches agents, post‑training of models, multimodal and multilingual models, and efficient large‑model methods. He has published papers at ACL, NAACL, NeurIPS, TNNLS and serves as a reviewer.

The talk addresses the “experience bottleneck” in reinforcement‑learning agents, where real‑world interaction data are scarce, costly, and limited in coverage. World models that simulate interactions can generate imagined experience, potentially alleviating this bottleneck. The question examined is whether large language models (LLMs) can reliably serve as world models in textual environments.

Li defines world modeling as interactive next‑state prediction and proposes a three‑layer evaluation framework covering fidelity & consistency, scalability & robustness, and practical utility for agents. Systematic experiments on five representative text‑based environments show that sufficiently fine‑tuned LLMs maintain coherent latent states, and their prediction performance scales predictably with data size and model size.

Further, the experiments demonstrate that LLM‑based world models can improve agent performance through action verification, synthetic trajectory generation, and reinforcement‑learning hot‑starting. However, the results also reveal that behavior coverage and environment complexity impose critical constraints on the effectiveness of world modeling, delineating clear capability boundaries for LLMs transitioning from “word prediction” to “world modeling”.

large language modelsreinforcement learningEvaluation Frameworkagent performanceworld modelingtextual environments
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.