Artificial Intelligence 16 min read

Fine‑Tuning vs. Context Learning: Building Apps with the Emerging LLM Tech Stack

This article explores how developers can integrate large language models into applications by comparing fine‑tuning and context learning, detailing each method’s advantages and drawbacks, and presenting a four‑layer LLM tech stack—data, model, orchestration, and operations—with practical tooling examples.

21CTO

Apr 29, 2024

Fine‑Tuning vs. Context Learning: Building Apps with the Emerging LLM Tech Stack

Current era LLMs such as OpenAI GPT‑4 and Meta Llama 2 are increasingly popular among developers because of their usability and key role in generative AI applications.

Developers can quickly add LLM‑supported features using either fine‑tuning or context learning, which form the emerging AI tech stack.

1. Customizing Pre‑trained LLMs

Pre‑trained models are trained on massive public data sources (Common Crawl, Wikipedia, Project Gutenberg). GPT‑4 has ~1.76 trillion parameters, Llama 2 has 70 billion. Two generic customization approaches exist: fine‑tuning and context learning.

1.1 Fine‑tuning

Fine‑tuning trains the model on a smaller, domain‑specific dataset, altering its parameters to specialize its knowledge.

Since Dec 2023 OpenAI offers fine‑tuning for GPT‑3.5‑Turbo via its API; GPT‑4 fine‑tuning is available through an experimental UI for qualified users. Llama 2 can be fine‑tuned on platforms such as Google Colab or Hugging Face.

Higher quality output than prompts alone

More training examples available

Lower cost and latency after fine‑tuning due to shorter prompts

Drawbacks:

Requires ML expertise and significant resources

Risk of catastrophic forgetting

Potential over‑fitting

1.2 Context Learning

Context learning does not modify the underlying model. It guides LLM output through structured prompts and retrieved data, providing the right information at the right time.

Few‑shot prompting supplies example input‑output pairs as part of the prompt, effectively creating a mini‑training set. Since LLMs are trained on static data, they lack knowledge of recent events; retrieval‑augmented generation (RAG) can supplement missing information from vector or SQL databases, APIs, or document stores.

As of Dec 2023 GPT‑4 Turbo supports up to 128 k tokens context length, while Llama 2 supports 4 k tokens.

Advantages of context learning:

No ML expertise required; lower resource demand

No risk of damaging the base model

Separate management of proprietary data

Disadvantages:

Generally lower output quality than fine‑tuning

Limited by model’s maximum context length

Higher cost and latency for long prompts

2. Emerging LLM Tech Stack

The stack consists of three main layers and a supplemental operations layer:

Data layer – preprocessing and storing private data embeddings

Orchestration layer – coordinating components, retrieving information, building prompts

Operations layer (supplemental) – monitoring, caching, validation tools

Model layer – the LLM itself

2.1 Data Layer

The data layer handles private and supplemental information through extraction, embedding, and storage.

Extraction gathers data from various sources (cloud buckets, PDFs, HTML, CRM APIs, SQL databases, wikis, emails). Optional cleaning and conversion to a standard format (e.g., JSON) may be performed.

Embedding transforms text into vector representations using models such as OpenAI’s Ada V2, which accepts strings or token arrays (max 8,192 tokens per request, 2,048 dimensions). Chunking may be required for large texts.

Storage saves embeddings together with raw data in a vector database or a traditional database with vector‑search extensions. Vector databases are optimized for similarity search, while adding vector extensions to existing SQL/NoSQL databases can be a quicker transition for simple use cases.

2.2 Model Layer

The model layer provides ready‑made LLMs (e.g., GPT‑4, Llama 2) chosen based on purpose, cost, performance, and complexity. Access is typically via an API endpoint that receives input and returns generated output.

2.3 Orchestration Layer

The orchestration layer acts as the controller, assembling prompts, retrieving relevant data from vector stores or APIs, and invoking the LLM. Example workflow for a customer‑service chatbot handling a refund query:

Prompt template with instructions and example dialogues is prepared.

Relevant data is fetched from a vector database (e.g., refund policy).

The full prompt with context is sent to the chosen LLM (GPT‑4).

The LLM returns the answer, which is relayed to the user.

Frameworks such as LangChain (JavaScript/Python) or visual tools like Flowise enable building such pipelines.

2.4 Operations Layer

When LLM‑enabled applications move to production, an operations layer (LLMOps) improves performance and reliability. Typical LLMOps tools cover monitoring, semantic caching, and validation against prompt‑injection attacks.

For the chatbot, logging requests/responses enables later accuracy assessment; caching frequent answers reduces API calls; pre‑validation of user input mitigates malicious queries.

3. Conclusion

By understanding fine‑tuning and context learning, developers can choose the appropriate method for adapting LLMs, and the four‑layer tech stack provides a roadmap for building robust LLM‑powered applications.

Author: 金宝 Reference: https://www.codesmith.io/blog/introducing-the-emerging-llm-tech-stack

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.