60 Essential AI Terms Every Programmer Should Master
This article walks programmers through 60 core AI concepts—from the basics of large language models and tokens to advanced topics like prompt engineering, retrieval‑augmented generation, fine‑tuning, and inference optimization—organized into progressive skill levels and illustrated with concrete examples and code snippets.
Overview
The author introduces a roadmap of AI terminology for developers, grouping 60 key words into seven skill levels (Lv 1‑7) that guide you from a vague notion of AI to the research foundations needed to read papers.
Lv 1 – Zero‑Base: What AI Actually Is
AI
In a programmer’s context, AI essentially means large models or deep learning . Tools like Copilot and ChatGPT are built on the same technology.
Large Language Model (LLM)
LLMs such as GPT, Claude, and DeepSeek are “next‑token prediction” engines: given a context, they assign probabilities to each possible next token, sample one, and repeat. Scaling the model to billions of parameters enables code generation, text writing, and translation.
Prompt
A prompt is the natural‑language input to an LLM, equivalent to a function’s arguments. Precise prompts yield useful results; vague prompts produce garbage, much like poorly written SQL.
Multimodal
Multimodal models accept text, images, audio, or video together. For example, Claude can analyse a diagram to decide whether a system is micro‑service‑based or monolithic.
AIGC
AI‑Generated Content (AIGC) covers any AI‑produced artifact—code, documentation, images, video. It is simply the model’s output.
Token
Tokens are the smallest text units LLMs process (roughly 1.3 tokens per English word, 2 tokens per Chinese character). API billing is token‑based: input is cheap, output is expensive.
Context Window
The maximum number of tokens an LLM can attend to at once. GPT‑4’s 128K window can hold an entire novel; exceeding the window causes truncation or “forgetting,” similar to Redis’s maxmemory eviction.
Lv 2 – Getting the Model to Follow Your Instructions (9 terms)
System Prompt
A hidden, always‑active instruction that sets the model’s persona, e.g., “You are a backend engineer; always provide code examples.”
Temperature
Controls output randomness. Low temperature (≈0) yields deterministic code/JSON; high temperature produces creative prose. It is the T parameter in softmax.
Chain‑of‑Thought (CoT)
Adding “think step‑by‑step before answering” to a prompt forces the model to expose its reasoning, improving accuracy on complex tasks.
Structured Output
Ask the model to return valid JSON; OpenAI and Anthropic now support JSON‑Schema constraints, preventing parsing failures.
Function Calling / Tool Use
The model can suggest a function call, e.g., get_weather("Beijing"). The system executes the function, feeds the result back, and the model produces a natural‑language answer. This mechanism powers agents.
Context
All information the model sees for each token: system prompt, conversation history, user input, and tool results. Retrieval‑augmented generation (RAG) dynamically expands this context.
Role Prompting
Explicitly tell the model who it is, e.g., “You are a senior Go developer. Write idiomatic code.” This simple line boosts answer quality.
Streaming
LLMs can emit tokens as they are generated (SSE). The client receives a “typing” effect until the special token [DONE] signals completion.
Few‑Shot Prompting
Provide a few input → output examples in the prompt. The model imitates the pattern, dramatically improving tasks like natural‑language‑to‑SQL conversion.
Lv 3 – How the Model Works (9 terms)
Parameters
Adjustable weights; a 7B model has 7 billion parameters and needs ~14 GB VRAM (FP16), while a 405B model would need ~810 GB.
Pre‑training
Training on massive corpora (e.g., Llama 3 consumed 15 trillion tokens and cost tens of millions of dollars) produces a base model that can continue text but cannot answer questions yet.
Inference
Running a trained model to generate output. The bottleneck is GPU memory bandwidth (HBM), not raw compute speed.
Hallucination
LLMs can confidently produce false statements, such as inventing a plausible‑looking npm version number. This is a universal issue; retrieval‑augmented generation mitigates it.
Alignment
Turning a base model into a helpful assistant via RLHF (reinforcement learning from human feedback). Human annotators rank responses, and the model is fine‑tuned on these preferences.
Emergence
When scaling past a certain size, models exhibit abilities not explicitly taught (e.g., translation learned from pure text‑completion data). The phenomenon is not fully understood.
Mixture‑of‑Experts (MoE)
Only a subset of parameters is activated per inference, reducing cost. DeepSeek‑V3 uses MoE to match GPT‑4’s parameter count while running only ~5 % of them.
Transformer
The 2017 “Attention Is All You Need” architecture that underlies all modern LLMs. It consists of self‑attention layers and feed‑forward networks.
Attention Mechanism
Assigns importance weights to tokens; for example, the word “不” in “我不喜欢这个电影” receives high weight because it flips the sentiment. Complexity is O(n²).
Lv 4 – Using AI as a Module (9 terms)
AI API
HTTP endpoints like /v1/chat/completions (OpenAI) that most serving frameworks (vLLM, Ollama) emulate.
Agent
An LLM equipped with tool calls and a loop: generate an action, execute it, feed the result back, repeat until a goal is reached.
RAG (Retrieval‑Augmented Generation)
Split documents into chunks, embed each chunk, store in a vector DB, retrieve the most relevant chunks for a query, and prepend them to the prompt. No fine‑tuning required, but retrieval quality caps answer quality.
Embedding
Map a text chunk to a high‑dimensional vector. OpenAI’s text-embedding-3-small and the open‑source BGE‑M3 are common choices.
Vector Database
Stores embeddings and performs approximate nearest‑neighbor (ANN) search. Popular options: Pinecone (managed), Milvus (open‑source), pgvector (PostgreSQL extension).
MCP (Model Context Protocol)
Anthropic’s open standard for LLM‑tool interaction, likened to a “USB‑C” for AI services.
Prompt Engineering
Systematic study of how to craft prompts for optimal results, covering few‑shot, CoT, role prompting, structured constraints, and verification.
Vibe Coding
Coined by Andrej Karpathy: describe a requirement in natural language, let the model generate code, and the developer reviews and merges it. Works with tools like Cursor or Claude Code.
AI Workflow
Chain multiple LLM and tool calls into a DAG: intent detection → route to handler → RAG / search / DB → final answer. Implementable with low‑code platforms (Dify, Coze) or custom code.
Lv 5 – Fine‑Tuning and Model Compression (9 terms)
Fine‑tuning
Take an open‑source base model (e.g., Llama 3.1 8B) and train one more epoch on domain‑specific data (e.g., company support dialogs) to create a specialised model.
LoRA / QLoRA
Parameter‑efficient fine‑tuning that adds low‑rank matrices instead of updating all weights. QLoRA quantises these matrices to 4‑bit, enabling fine‑tuning of a 7B model on a single RTX 4090.
Quantization
Compress model weights from FP16/FP32 to INT8/INT4, shrinking size and often speeding inference. The GGUF format (e.g., Q4_K_M) from llama.cpp is a popular quantised checkpoint.
Distillation
Train a smaller “student” model to mimic the outputs of a larger “teacher” model. DeepSeek‑R1’s capabilities stem from distilling a larger model.
Benchmark
Standard evaluation suites: MMLU (general knowledge), HumanEval (code), MATH (math reasoning), GSM8K (grade‑school math). Higher benchmark scores do not guarantee real‑world usefulness; domain‑specific testing is essential.
Inference Optimization
Techniques such as KV‑Cache, continuous batching, and FlashAttention dramatically improve throughput. vLLM bundles all three.
RAG Pipeline
Full engineering flow: document parsing → chunking → embedding → vector store → retrieval → rerank → prompt assembly → LLM generation. Each stage influences answer quality.
LLM Application Framework
Libraries that simplify building LLM apps: LangChain (feature‑rich), LlamaIndex (RAG‑focused), Haystack (flexible pipelines). Simpler projects often just use the OpenAI SDK.
Semantic Search
Vector‑based retrieval that matches meaning rather than keywords, enabling queries like “how to speed up an API” to surface caching and indexing articles.
Lv 6 – Production‑Ready Stack (9 terms)
Hugging Face
“GitHub for AI”: hosts hundreds of thousands of models, datasets, and demos. The transformers Python package is the de‑facto NLP library.
Open‑Source Model Ecosystem
Major LLMs: Meta’s Llama 3.1 405B, Alibaba’s Qwen (Chinese‑centric), DeepSeek (MoE, cost‑effective), Mistral (European). Model selection mirrors framework selection—community size, documentation, and maintenance matter.
Ollama / llama.cpp
Local inference engines. llama.cpp runs on CPU; Ollama wraps it with a simple ollama run llama3.1 command, useful for rapid prototyping without paid APIs.
GPU / CUDA
AI workloads rely on NVIDIA GPUs with CUDA support; AMD’s ROCm lags behind in ecosystem maturity.
Model Serving
Frameworks that expose LLMs as HTTP services: vLLM (high‑performance, PagedAttention), TGI (Hugging Face’s official server), Ollama (single‑machine simplicity).
AI Service Architecture
Typical backend flow: Frontend → API Gateway → Business logic → LLM Service + Vector DB + Redis + Traditional DB. LLM latency (2‑30 s) forces asynchronous calls, streaming responses, and retry logic.
Diffusion Model
Generative models for images/video (Stable Diffusion, Midjourney, Sora) that iteratively denoise a random tensor to produce visual content—distinct from LLMs.
Compute
AI’s “oil”: GPU count × VRAM × interconnect bandwidth. Training GPT‑4‑scale models costs enough to buy dozens of Beijing‑area apartments; inference workloads are measured in PFLOPs.
Data Pipeline
Pre‑training data flow: collection → cleaning → deduplication → quality filtering → formatting. Data quality outweighs model architecture; garbage‑in, garbage‑out.
Lv 7 – Research Foundations (8 terms)
Neural Network
Stacked layers of neurons performing activation(input×weight + bias). A single neuron resembles logistic regression; billions together form deep models.
Backpropagation
Compute loss, propagate gradients backward, and update weights—repeated billions of times on GPUs.
Gradient Descent
Iteratively move parameters along the steepest descent direction to minimise loss. Variants include SGD, Adam, and AdamW (standard for LLM training).
Loss Function
Quantifies prediction error; LLMs use cross‑entropy between the predicted token distribution and the true token.
Tensor
Multi‑dimensional arrays (scalar, vector, matrix, higher‑dimensional). Core operations are tensor algebra; GPUs excel at parallel matrix multiplication.
Activation Function
Introduces non‑linearity. Common LLM activations: ReLU, GELU (used in Transformer FFNs), SwiGLU (newer standard).
Reinforcement Learning (RL)
Agent interacts with an environment, receives reward, and updates policy to maximise cumulative reward. RLHF treats human preference as the reward signal for LLM alignment.
AGI (Artificial General Intelligence)
The hypothetical AI that can perform any intellectual task at human level. Not yet realised, but many believe continued LLM scaling will eventually lead there.
How to Use This Table
The embedded table maps a developer’s current state to the recommended skill levels. For beginners, start with Lv 1‑2 and obtain an API key; for those already integrating LLMs, focus on Lv 3‑4 (RAG, agents, embeddings); for fine‑tuning or research, jump to Lv 5‑7.
Remember: you don’t need to master all 60 terms at once. Register an API key, make the first request, and look up unfamiliar words as they appear—learning happens on the job.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
