Zhihu Direct Answer: Product Overview and Technical Practices
This article summarizes the key technical insights from Zhihu Direct Answer, an AI-powered search product, covering its product overview, RAG framework, query understanding, retrieval strategies, chunking, reranking, generation techniques, evaluation methods, and engineering optimizations for cost and performance.
In the first Zhihu Tech Salon, Zhihu AI algorithm lead Wang Jiewu shared practical experiences from building the AI search product Zhihu Direct Answer. The talk is organized into three parts: product introduction, practical experience sharing, and professional edition overview.
Zhihu Direct Answer is a community‑driven general AI search product that emphasizes professional, reliable, and trustworthy answers by leveraging verified Zhihu content and external academic sources.
The system adopts a Retrieval‑Augmented Generation (RAG) framework, where a query first retrieves relevant knowledge base entries and then feeds them as context to a large language model, reducing hallucinations and improving explainability.
Key query‑understanding practices include semantic completion for incomplete queries, context‑aware rewriting for mixed intents, and multi‑turn expansion for short or ambiguous queries, often implemented via fine‑tuned models.
Retrieval employs a multi‑strategy approach: semantic recall using a BGE‑tuned embedding model, tag‑based recall, and vector space alignment, with techniques such as Matryoshka representation, dense‑sparse hybrid (BGE‑M3), and ColBERT for high‑precision scenarios.
Chunking strategies aim to reduce latency and improve information utilization. Simple fixed‑length chunking is fast but can harm semantic coherence; a more robust approach merges relevance‑sorted chunks, expands boundaries, and uses a rank‑merge‑expand pipeline to produce a single high‑quality chunk per document.
Reranking focuses on key‑information perception, diversity control, and authority weighting by incorporating Zhihu community voting signals.
Generation enhancements include metadata‑enriched context, planning capabilities for multi‑step reasoning, and continuous model alignment using DPO, PPO, and other reinforcement‑learning methods.
Evaluation combines automated scoring (LLM, preference models, bad‑case sets) with multi‑dimensional human assessment (blind reviews, GSB checks) and final A/B testing to ensure reliability.
Engineering optimizations feature a DAG‑based modular architecture, full‑stack monitoring, and cost reductions through model quantization (≈50% savings) and domain‑specific model distillation while maintaining >95% performance.
The professional edition adds high‑quality data sources (academic papers, curated Zhihu content), supports PDF upload and intelligent parsing, enables personalized knowledge bases, and offers deep‑reading modes for document‑level Q&A.
Future plans include tighter integration with the Zhihu community, multimodal interaction, stronger reasoning capabilities, and continued professionalization to serve research and expert users.
Zhihu Tech Column
Sharing Zhihu tech posts and exploring community technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.