Scalable Enterprise AI Assistant: Intent Planning, Context Engineering, Data Iteration
This article details the end‑to‑end design of an enterprise AI office assistant, covering the three‑layer framework of intent planning, context engineering, and data self‑iteration, the key pain points of intent understanding, knowledge integration, and quality control, and practical architectural and implementation solutions for scalable deployment.
Three‑Layer Framework for Enterprise AI Assistants
The solution is organized into three tightly coupled layers that together achieve >99% intent‑recognition accuracy, >93% task‑delivery accuracy, and a closed‑loop data‑self‑iteration process.
Intent Planning – builds a complete user profile, rewrites queries, and selects the appropriate handling path.
Context Engineering – injects knowledge (RAG), data (warehouse), and agents (MCP/A2A) into a unified execution context.
Data Self‑Iteration – continuously collects logs, extracts weak samples, retrains models, and redeploys them.
Product Architecture
The system is divided into four functional layers:
Entry – web front‑end with responsive design and optional IM embedding.
Intent Processing – user‑profile construction and intent classification.
Delivery – context engineering that combines knowledge retrieval, data lookup, and agent orchestration.
Evaluation – batch alignment with human judgments and consistency improvement.
Intent Understanding & Query Rewriting
Accurate intent detection requires the input query to be complete and unambiguous. Three rewrite techniques are applied:
Multi‑turn completion – appends recent dialogue turns to enrich a single‑turn query.
Coreference resolution – replaces pronouns with explicit entity references.
Long‑text summarization – extracts the core problem from lengthy inputs.
The prompt template includes the last three dialogue turns (truncated if necessary), the current query, and basic user metadata (e.g., location). The model is forced to output a JSON structure, e.g.:
{
"rewritten_query": "重庆的育儿假有几天?",
"metadata": {"location": "Chongqing"}
}Safety logic falls back to the original query if JSON parsing fails.
Intent Recognition Models
Fine‑tuned NLU models (BERT, RoBERTa) outperform generic LLMs for closed‑domain intent sets. Typical performance:
Accuracy > 99% on a 50‑intent taxonomy.
Inference latency < 50 ms on a single CPU core.
Cost ≈ 0.01 USD per 1 k queries.
A continuous data‑iteration loop is implemented:
Business teams label high‑conflict samples.
System logs capture raw queries and model predictions.
Weak‑sample extraction identifies low‑confidence cases.
Retraining incorporates new labels.
Dynamic deployment updates the serving endpoint without downtime.
Context Engineering
Four major challenges are addressed:
Dynamic document updates – real‑time sync with cloud document platforms, version control, and incremental indexing.
Multimodal knowledge – ASR converts audio/video to timestamped transcripts; a downstream model structures them into agenda slices (time, title, summary) for vector storage.
Customer‑service knowledge – resolved tickets are automatically harvested, normalized, and added to the knowledge base.
Conflicting sources – a rule engine applies version‑first → time‑first → channel‑priority → user‑context ordering.
Implementation highlights:
Permission‑based retrieval maps raw file paths to vector embeddings.
Entity‑vector mapping enables fast lookup of structured entities.
Data Injection & Query Handling
Two pragmatic ingestion patterns are used to achieve >98% answer accuracy while keeping API coupling low:
Subject‑wide wide tables – a denormalized employee‑centric table aggregates HR, attendance, training, and certification data. Queries only need a user identifier.
FAQ conversion – dynamic data (e.g., cooperation status) is transformed into standardized Q&A pairs with templated answers.
Agent Collaboration
Two integration modes enable flexible workflow execution:
Intelligent Planning – the LLM schedules and invokes modular tools (MCP) autonomously, suitable for open‑ended processes such as leave applications.
Precise Routing – intent detection routes directly to specialized agents (e.g., compliance, finance) with fine‑grained permission checks and rich‑text rendering.
All actions that write data or trigger real‑world effects require explicit user consent before execution.
Answer Quality Inspection
A lightweight quality‑inspection model built on a generic LLM evaluates assistant responses against human‑annotated references. Iterative prompting improves consistency to ~89% (benchmark ≥ 80%). Confusion matrices are generated for each evaluation batch to monitor drift.
Implementation Details
Real‑time Document Sync
Change detection on cloud document services triggers incremental vector updates. Version tags ensure the latest policy is always served.
Permission Control
Raw document permissions are mapped to two internal levels – manage and view . Retrieval respects these levels, preventing leakage of restricted content.
Multimodal Ingestion Pipeline
Audio is transcribed via ASR; timestamps and speaker tags are stored. A downstream model extracts agenda items, which are indexed as {"time":…, "title":…, "summary":…} vectors.
Conflict Resolution Rules
Version‑first: always prefer the newest document version.
Time‑first: if versions are equal, use the most recently indexed record.
Channel‑priority: internal FAQ > OA policy > external search > LLM world knowledge.
User‑context: tailor answers based on the user’s location or department.
Model Selection for Query Rewriting
Small to medium‑size models (3 B–30 B parameters) are sufficient; fine‑tuning is optional. Enforcing JSON output guarantees downstream stability.
Data‑Self‑Iteration Loop
Business owners label ambiguous or high‑similarity intents.
System logs capture raw queries and model predictions.
Weak‑sample mining extracts low‑confidence cases.
Retraining incorporates new labels.
Continuous deployment updates the serving endpoint.
Typical convergence is observed within three months, after which manual review effort drops dramatically.
Conclusion
The three‑layer architecture—Intent Planning, Context Engineering, and Data Self‑Iteration—provides a reproducible foundation for turning prototype AI assistants into production‑grade digital employees. By combining fine‑tuned NLU, robust knowledge pipelines, and a data‑driven quality‑inspection loop, enterprises can achieve high‑accuracy intent handling, reliable multimodal knowledge access, and safe agent orchestration at scale.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
