Artificial Intelligence 17 min read

What’s Next for Large Language Models? Emerging Trends Shaping AI

The article explores three emerging directions for next‑generation large language models—self‑generated training data, built‑in verification with external retrieval, and massive sparse‑expert architectures—highlighting recent research, practical challenges, and their potential to reshape AI development.

21CTO

Feb 27, 2023

What’s Next for Large Language Models? Emerging Trends Shaping AI

ChatGPT’s rapid rise has drawn massive attention to large language models (LLMs), which are all built on the same transformer‑based, autoregressive, self‑supervised pre‑training paradigm, yet current AI is far from reaching its ultimate capabilities.

The next generation of LLMs is expected to pursue three promising research avenues.

1. Models that generate their own training data

Researchers are investigating ways for LLMs to self‑improve by creating new textual data from the knowledge they have already absorbed and then using that data for further fine‑tuning. Early work from Google demonstrated a model that formulates questions, generates detailed answers, filters them for quality, and then self‑fine‑tunes, achieving state‑of‑the‑art results on benchmarks such as GSM8K (74.2% → 82.1%) and DROP (78.2% → 83.0%). Another study built on instruction‑tuning showed that a model capable of generating its own natural‑language instructions and then self‑tuning can boost GPT‑3 performance by roughly 33%, comparable to OpenAI’s own instruction‑tuned models.

Given estimates that the total amount of usable text tokens worldwide ranges from 4.6 trillion to 17.2 trillion, and that leading models such as DeepMind’s Chinchilla have already consumed 1.4 trillion tokens, the world may soon exhaust the supply of high‑quality training data. Enabling LLMs to synthesize their own data could therefore alleviate an imminent data shortage.

2. Models that can verify themselves

Current LLMs often produce “hallucinations”—confident but incorrect statements—making them unreliable for critical tasks. Improving factual accuracy requires two capabilities: (1) retrieving up‑to‑date information from external sources, and (2) providing citations for the retrieved content. Early systems like REALM (Google) and RAG (Meta) pioneered this approach. More recent efforts include OpenAI’s WebGPT, which can browse the web via Bing, cite sources, and outperform human‑written answers on many queries, and DeepMind’s Sparrow, which similarly retrieves information and supplies references, achieving useful and accurate citations in 78% of cases. Startup solutions such as You.com and Perplexity are also offering searchable, citation‑enabled conversational agents.

3. Massive sparse‑expert models

While most prominent LLMs are dense—activating all parameters for each prompt—sparse‑expert architectures activate only a subset of parameters relevant to the input, dramatically reducing compute while allowing models to scale to trillions of parameters. Examples include Google’s Switch Transformer (1.6 trillion parameters), GLaM (1.2 trillion), and Meta’s Mix‑of‑Experts (1.1 trillion). Research shows that such models can match dense‑model performance with far less computation and offer greater interpretability because the activated “experts” can be examined individually.

Although sparse models are more complex to train and not yet widely adopted, their efficiency and explainability make them strong candidates for future AI systems, especially in high‑risk domains where understanding model decisions is crucial.

In summary, self‑generated data, built‑in verification with external retrieval, and sparse‑expert architectures represent the three key directions that could define the next wave of generative AI and LLM innovation.