From Transformers to Agents: A Complete Timeline of Large Language Model Evolution

This article traces the evolution of large language models from the 2017 Transformer breakthrough through successive milestones such as BERT, GPT‑3, RL‑HF alignment, multimodal extensions, open‑source alternatives, and the rise of retrieval‑augmented generation, AI agents, and emerging protocols that shape modern AI applications.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
From Transformers to Agents: A Complete Timeline of Large Language Model Evolution

LLM Development Timeline

2017 – Transformer : Self‑attention replaces RNN/LSTM, enabling long‑range dependencies.

2018‑2020 – Pre‑training boom : BERT (bidirectional encoder) and GPT (autoregressive decoder) demonstrate the power of massive pre‑training and fine‑tuning.

2020 – GPT‑3 : 175 B parameters, few‑shot/zero‑shot capabilities across writing, coding and reasoning.

2021‑2022 – Alignment : Supervised Fine‑Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) reduce hallucinations and improve instruction following (GPT‑3.5, ChatGPT).

2023‑2024 – Multimodal & Open‑source : GPT‑4V/‑4o integrate vision, audio and video; open‑source models such as LLaMA 3.1‑405B narrow the gap.

2024‑2025 – Inference‑focused reasoning models : OpenAI o1 (chain‑of‑thought reasoning), o3/o4‑mini, DeepSeek‑V3 and DeepSeek‑R1 (MoE, 671 B parameters) achieve strong reasoning at lower cost.

Retrieval‑Augmented Generation (RAG)

Naïve RAG Pipeline

Indexing : Clean raw documents (PDF, HTML, Markdown), split into chunks, embed each chunk, store vectors in a vector DB.

Retrieval : Encode the user query, perform similarity search, select top‑k chunks.

Generation : Concatenate the query with retrieved chunks and feed to the LLM.

Advanced RAG Enhancements

Multi‑granular chunking (sliding windows, semantic splits) to avoid loss of context.

Metadata enrichment (title, author, timestamps, entities) for better filtering.

Hybrid BM25 + vector search to balance lexical recall and semantic relevance.

Hypothetical Document Embeddings (HyDE): generate a synthetic answer, embed it, and retrieve with higher semantic similarity.

Re‑ranking of retrieved passages based on LLM scoring.

Prompt compression to keep the final prompt within the model’s context window.

Embedding fine‑tuning or dynamic embeddings that adapt to the current context.

Query transformation: let the LLM decompose a complex query into sub‑queries before retrieval.

Modular RAG

Decompose the pipeline into interchangeable components (retriever, pre‑processor, generator). Each module can be swapped or specialized for a domain, enabling a “plug‑and‑play” architecture.

Graph RAG

During indexing, extract entities and relations to build a knowledge graph. Retrieval can then traverse the graph for multi‑hop reasoning, which is useful in domains such as medical diagnosis or legal research. Limitations include scalability of the graph and the need for high‑quality entity extraction.

Agentic RAG

An AI agent orchestrates retrieval and generation, allowing:

Multiple knowledge sources (private vector DB, web search, calculators, internal APIs).

Iterative retrieval‑generation loops with verification steps.

Hierarchical routing where a central agent delegates sub‑tasks to specialized agents.

Building LLM‑Based Agents

Typical architecture consists of four logical parts:

Planner : Generates a multi‑step plan or task decomposition.

Executor (Worker) : Calls external tools/APIs according to the plan.

Solver / Joiner : Merges tool results and produces the final answer.

Memory (optional) : Short‑term context for the current session and long‑term storage for learned feedback.

Key Interaction Protocols

Function Call (OpenAI, 2023): LLM emits a JSON‑schema request that the host executes (e.g., web search, calculator).

Model Context Protocol (MCP) : A lightweight, vendor‑agnostic protocol that defines discovery, description and invocation of external tools, improving cross‑model compatibility.

Agent‑to‑Agent (A2A) : Standardized messages for secure collaboration between agents from different platforms, supporting capability discovery, task assignment and artifact exchange.

AG‑UI : Event‑based streaming protocol (SSE / WebSocket) for real‑time interaction between an agent and a front‑end, enabling human‑in‑the‑loop control.

Inference‑Focused Reasoning Models

Models such as OpenAI o1, o3, o4‑mini and DeepSeek‑R1 embed chain‑of‑thought (CoT) or tree‑of‑thought (ToT) reasoning inside the model, exposing only the final answer to the user while performing multi‑step internal reasoning. They can also invoke tools (image generation, web search, Python execution) directly from the same model.

Framework Landscape

Open‑source frameworks that automate the above components:

MetaGPT – https://github.com/FoundationAgents/MetaGPT Phidata – https://www.phidata.com/ OpenAI Swarm – https://github.com/openai/swarm Microsoft Autogen – https://www.microsoft.com/en-us/research/project/autogen/ CrewAI – https://www.crewai.com/ Vertex AI – https://cloud.google.com/vertex-ai These frameworks handle prompt construction, tool registration and orchestration, but they add an abstraction layer that can hide LLM inputs/outputs and increase debugging complexity.

Practical Recommendations

Start with a simple prompt‑plus‑tool call; many use‑cases are solved without a full agent stack.

If additional logic is required, add a planner that emits a concise step list and let a single tool‑enabled agent execute it.

Prefer a single, well‑instrumented agent over a multi‑agent hierarchy unless the problem truly benefits from domain‑specific sub‑agents.

Keep designs transparent: expose planning steps to users, log tool calls, and define quantitative metrics (latency, token cost, correctness) for continuous improvement.

When using a framework, understand the underlying API calls; avoid treating the framework as a black box.

Multi‑Agent Architectural Patterns

Single Agent : One agent handles the entire task.

Network : Every agent can call any other agent.

Supervisor : A central coordinator routes sub‑tasks to specialized agents.

Hierarchical : Multiple supervisors form a tree of control.

Custom : Fixed communication topology tailored to the application.

Common failure modes stem from poor protocol design (missing termination conditions), communication breakdowns between agents, and insufficient verification of intermediate results.

Mitigation Strategies

Define explicit termination criteria and validation steps for each sub‑task.

Standardize message formats (e.g., JSON schemas) to reduce ambiguity.

Integrate domain‑specific validators (symbolic checks, unit tests) as part of the feedback loop.

When confidence is low, pause execution and request additional information instead of proceeding blindly.

Summary of Core Principles

Simplicity First : Use the simplest possible solution; only add complexity when the baseline fails.

Transparency : Make planning steps, tool calls and intermediate outputs visible to users and developers.

Quantitative Evaluation : Track latency, token usage, correctness, and hallucination rates to guide iterative improvements.

Modularity : Build pipelines from interchangeable components so that indexing, retrieval, or generation can be upgraded independently.

Standard Protocols : Adopt Function Call, MCP, A2A, and AG‑UI to ensure cross‑model and cross‑platform interoperability.

prompt engineeringlarge language modelsRAGreinforcement learningopen-source models
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.