Artificial Intelligence 22 min read

Master LLMs: Basics, Prompt Engineering, RAG, Agents & Multimodal AI

This article provides a comprehensive overview of large language models, covering their fundamental concepts, historical milestones, parameter scaling, prompt engineering techniques, retrieval‑augmented generation, autonomous agents, and multimodal model applications, illustrating how these technologies reshape AI capabilities across domains.

Data Thinking Notes
Data Thinking Notes
Data Thinking Notes
Master LLMs: Basics, Prompt Engineering, RAG, Agents & Multimodal AI

1. LLM Basics

1.1 What is an LLM?

LLM stands for Large Language Model, a deep‑learning‑based natural‑language‑processing tool that can understand and generate text, images, and audio. Trained on massive corpora, LLMs excel at translation, writing, dialogue, summarisation, and many other tasks.

1.2 History

Key milestones include the 2017 introduction of the Transformer architecture by Vaswani et al., followed by models such as GPT and BERT that leveraged self‑attention to achieve parallel computation and superior contextual capture.

1.3 Model Size (B)

Parameters are measured in billions (B). For example, GPT‑3 uses 175 B parameters; larger models generally have stronger representation ability but require more data and compute, and excessive parameters can lead to over‑fitting if training data are insufficient.

2. Prompt Engineering

2.1 Prompt Concept

A prompt is a carefully designed instruction or sentence that guides the model to produce outputs aligned with user intent.

2.2 Prompt Components

Instruction (required): tells the model what to do.

Context (optional): additional knowledge, often retrieved from a vector database.

Input Data (optional): the user’s query or data to be processed.

Output Indicator (optional): marks the beginning of the desired output.

2.3 Design Principles

Clear goal: define the task explicitly.

Specific guidance: provide concrete constraints.

Concise language: keep prompts short and clear.

Appropriate cues: use examples or boundary questions.

Iterative optimisation: refine based on model outputs.

2.4 Prompt Types

Zero‑Shot Prompting

Few‑Shot Prompting

Chain‑of‑Thought (CoT)

Self‑Consistency

Tree of Thoughts (ToT)

ReAct framework

<code>prompt = """ Answer the question based on the context below. If the question cannot be answered using the information provided answer with "I don't know".
Context: Large Language Models (LLMs) are the latest models used in NLP. Their superior performance over smaller models has made them incredibly useful for developers building NLP‑enabled applications. These models can be accessed via Hugging Face's `transformers` library, via OpenAI using the `openai` library, and via Cohere using the `cohere` library.
Question: Which libraries and model providers offer LLMs?
Answer: """</code>

3. Retrieval‑Augmented Generation (RAG)

RAG first retrieves relevant documents from a knowledge base and then feeds them into the LLM, improving factual accuracy and mitigating hallucinations.

3.1 Problems Addressed

Hallucination: models may generate plausible but false statements.

Knowledge cutoff: static training data cannot cover real‑time or proprietary information.

Data security: on‑premise retrieval keeps sensitive data within the enterprise.

3.2 Architecture

RAG can be viewed as "retrieval + generation". Retrieval uses vector databases (FAISS, Milvus, etc.) to fetch relevant chunks; generation uses a prompt that combines the user query with the retrieved context.

3.3 Workflow

Data preparation : extraction → text splitting → embedding → indexing. Application : user query → similarity or full‑text search → prompt injection → LLM generation.

4. AI Agents

4.1 Concept

Agents are AI systems that perceive an environment, plan actions, execute them, and learn from feedback, using an LLM as the reasoning core.

4.2 Core Components

LLM : provides reasoning and language generation.

Tools : external APIs, code execution, search, etc.

Memory : short‑term (context window) and long‑term (vector store) storage of interaction history.

Planning : task decomposition, CoT, ToT, ReAct.

4.3 ReAct Example

The ReAct loop interleaves

Thought

,

Action

, and

Observation

steps, allowing the agent to query external tools and refine its reasoning.

<code>Thought: Need to find programs that can control Apple Remote.
Action: Search["Apple Remote control programs"]
Observation: ...
... (repeated many times)</code>

5. Multimodal Models

5.1 Definition

Multimodal models process and understand multiple data types—text, images, audio, video—simultaneously.

5.2 Why Multimodal?

The real world is multimodal; integrating diverse signals yields richer understanding, higher robustness, and better generalisation.

5.3 Characteristics & Applications

Information integration across modalities.

Enhanced expressive power.

Improved robustness when one modality is missing.

Use cases: medical diagnosis, autonomous driving, intelligent customer service.

References

https://arxiv.org/abs/2402.06196

https://arxiv.org/abs/2308.10792

https://arxiv.org/abs/2312.10997

https://lilianweng.github.io/posts/2023-06-23-agent

https://python.langchain.com/docs/modules/agents

https://www.promptingguide.ai/zh/techniques/fewshot

AI agentsLLMPrompt EngineeringRAGmultimodal
Data Thinking Notes
Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.