Mastering AI Agents: Building Knowledge Bases, Workflows, and Prompt Engineering

This article explains how to design a high‑performing AI Agent by constructing a robust knowledge base, orchestrating efficient workflows, and crafting precise prompts, covering vector storage, graph databases, retrieval strategies, and practical prompt‑engineering techniques.

Instant Consumer Technology Team
Instant Consumer Technology Team
Instant Consumer Technology Team
Mastering AI Agents: Building Knowledge Bases, Workflows, and Prompt Engineering

A good AI Agent typically consists of five components: Large Language Model (LLM) as the core compute engine, Tools for capability extension, a Knowledge Base (RAG) to provide depth and reduce hallucinations, a Workflow to define processing logic, and Prompt engineering to control output precision. Designing these components together with business scenarios is the main challenge.

1. Knowledge Base

Collect raw documents (PDF, Word, PPT) and convert them into text using tools such as minerU. Split the text into small chunks (by chapter, topic, or fixed length) and tag each chunk with metadata (e.g., date, keywords) for fast retrieval.

标题:海外知识产权纠纷应对策略
时间:2025年9月
关键词:专利、法律反制、风险预警

Store the chunks as vectors so the system can understand meaning rather than just matching keywords. Three retrieval modes are supported:

Semantic search – matches based on vector similarity.

Keyword search – matches exact terms.

Graph (entity) search – matches relationships between entities.

A hybrid architecture using Milvus (vector database) and Neo4j (graph database) is recommended.

Data model design: Build a "person relationship" graph in Neo4j and a "feature archive" in Milvus. The graph stores nodes and edges (e.g., a person’s attributes and relationships), while Milvus stores numeric fingerprints (vectors) for similarity search.

Vectorization pipeline:

Pre‑process: split documents into tokens, remove stop words.

Encode: use a model such as BERT to convert text to vectors.

Store: save vectors in Milvus and metadata (title, author) in Neo4j.

Index optimization:

Milvus – build HNSW indexes for fast vector lookup.

Neo4j – create indexes on frequently queried properties (e.g., name).

Knowledge retrieval: Combine semantic and graph search to answer queries like “how to treat hypertension” by first finding relevant documents semantically and then exploring related entities such as side effects.

Ranking strategies:

Multi‑stage re‑ranking: coarse BM25 ranking followed by a cross‑encoder.

Contextual ranking: adjust scores based on user history and current dialogue.

Rule‑based intervention: apply business rules (e.g., security, freshness) for final filtering.

Knowledge update: Implement automatic change detection, incremental updates, and version conflict handling. Preserve historical versions and use metadata (effective time, status) to filter outdated information.

2. Workflow

A workflow defines the step‑by‑step execution plan for an Agent. Example for a weather query:

1、Check knowledge‑base cache (if updated yesterday);
2、If no cache, call a weather API (e.g., Gaode);
3、Use LLM to format the result into natural language.

Workflows may include loops of reflection and re‑action, such as retrieving data, generating code, handling errors, and retrying until a satisfactory answer is produced. The article mentions Coze as a mature commercial workflow platform.

3. Prompt Engineering

Prompt engineering designs the “mindset” of the AI. Effective system prompts should define a clear role (e.g., "you are an e‑commerce product copy generator") and provide concise instructions without unnecessary explanations.

错的角色设定:"你是资深电商运营专家,擅长写产品文案"
对的角色设定:"你是电商产品文案生成器,只负责根据参数输出100字产品描述,不添加额外解释"

Provide relevant context (e.g., order number, payment status) so the model can answer accurately. Use well‑crafted examples to teach the model the desired output format, prioritizing quality, random order, and comprehensive coverage.

正常查询:"手机没电了怎么办?" → "充电5分钟,通话2小时"
异常情况:"手机进水了,还能用吗?" → "请立即关机,勿充电"

When requiring structured output (e.g., JSON), explicitly state the constraint, give examples, and repeat the instruction at both the beginning and end of the prompt.

你是JSON生成器,只输出JSON,不要任何其他文字。
示例:输入:iPhone 15价格 → 输出:{"name":"iPhone 15","price":"5999"}
请严格按JSON格式输出。
LLMprompt engineeringworkflowRAGvector databaseKnowledge BaseAI Agent
Instant Consumer Technology Team
Written by

Instant Consumer Technology Team

Instant Consumer Technology Team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.