Fine‑Tuning vs. Context Learning: Building Apps with the Emerging LLM Tech Stack
This article explores how developers can integrate large language models into applications by comparing fine‑tuning and context learning, detailing each method’s advantages and drawbacks, and presenting a four‑layer LLM tech stack—data, model, orchestration, and operations—with practical tooling examples.
Current era LLMs such as OpenAI GPT‑4 and Meta Llama 2 are increasingly popular among developers because of their usability and key role in generative AI applications.
Developers can quickly add LLM‑supported features using either fine‑tuning or context learning, which form the emerging AI tech stack.
1. Customizing Pre‑trained LLMs
Pre‑trained models are trained on massive public data sources (Common Crawl, Wikipedia, Project Gutenberg). GPT‑4 has ~1.76 trillion parameters, Llama 2 has 70 billion. Two generic customization approaches exist: fine‑tuning and context learning.
1.1 Fine‑tuning
Fine‑tuning trains the model on a smaller, domain‑specific dataset, altering its parameters to specialize its knowledge.
Since Dec 2023 OpenAI offers fine‑tuning for GPT‑3.5‑Turbo via its API; GPT‑4 fine‑tuning is available through an experimental UI for qualified users. Llama 2 can be fine‑tuned on platforms such as Google Colab or Hugging Face.
Higher quality output than prompts alone
More training examples available
Lower cost and latency after fine‑tuning due to shorter prompts
Drawbacks:
Requires ML expertise and significant resources
Risk of catastrophic forgetting
Potential over‑fitting
1.2 Context Learning
Context learning does not modify the underlying model. It guides LLM output through structured prompts and retrieved data, providing the right information at the right time.
Few‑shot prompting supplies example input‑output pairs as part of the prompt, effectively creating a mini‑training set. Since LLMs are trained on static data, they lack knowledge of recent events; retrieval‑augmented generation (RAG) can supplement missing information from vector or SQL databases, APIs, or document stores.
As of Dec 2023 GPT‑4 Turbo supports up to 128 k tokens context length, while Llama 2 supports 4 k tokens.
Advantages of context learning:
No ML expertise required; lower resource demand
No risk of damaging the base model
Separate management of proprietary data
Disadvantages:
Generally lower output quality than fine‑tuning
Limited by model’s maximum context length
Higher cost and latency for long prompts
2. Emerging LLM Tech Stack
The stack consists of three main layers and a supplemental operations layer:
Data layer – preprocessing and storing private data embeddings
Orchestration layer – coordinating components, retrieving information, building prompts
Operations layer (supplemental) – monitoring, caching, validation tools
Model layer – the LLM itself
2.1 Data Layer
The data layer handles private and supplemental information through extraction, embedding, and storage.
Extraction gathers data from various sources (cloud buckets, PDFs, HTML, CRM APIs, SQL databases, wikis, emails). Optional cleaning and conversion to a standard format (e.g., JSON) may be performed.
Embedding transforms text into vector representations using models such as OpenAI’s Ada V2, which accepts strings or token arrays (max 8,192 tokens per request, 2,048 dimensions). Chunking may be required for large texts.
Storage saves embeddings together with raw data in a vector database or a traditional database with vector‑search extensions. Vector databases are optimized for similarity search, while adding vector extensions to existing SQL/NoSQL databases can be a quicker transition for simple use cases.
2.2 Model Layer
The model layer provides ready‑made LLMs (e.g., GPT‑4, Llama 2) chosen based on purpose, cost, performance, and complexity. Access is typically via an API endpoint that receives input and returns generated output.
2.3 Orchestration Layer
The orchestration layer acts as the controller, assembling prompts, retrieving relevant data from vector stores or APIs, and invoking the LLM. Example workflow for a customer‑service chatbot handling a refund query:
Prompt template with instructions and example dialogues is prepared.
Relevant data is fetched from a vector database (e.g., refund policy).
The full prompt with context is sent to the chosen LLM (GPT‑4).
The LLM returns the answer, which is relayed to the user.
Frameworks such as LangChain (JavaScript/Python) or visual tools like Flowise enable building such pipelines.
2.4 Operations Layer
When LLM‑enabled applications move to production, an operations layer (LLMOps) improves performance and reliability. Typical LLMOps tools cover monitoring, semantic caching, and validation against prompt‑injection attacks.
For the chatbot, logging requests/responses enables later accuracy assessment; caching frequent answers reduces API calls; pre‑validation of user input mitigates malicious queries.
3. Conclusion
By understanding fine‑tuning and context learning, developers can choose the appropriate method for adapting LLMs, and the four‑layer tech stack provides a roadmap for building robust LLM‑powered applications.
Author: 金宝 Reference: https://www.codesmith.io/blog/introducing-the-emerging-llm-tech-stack
Related reading:
Docker launches GenAI tech stack and AI assistants
LLM deployment: vLLM and quantization techniques
AI revolution business: 5 transformative uses of large language models
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
