Build a Real‑Time AI Search‑Enabled Q&A System with Higress and DeepSeek
This guide shows how open‑source LLMs like DeepSeek can power cost‑effective intelligent Q&A services, and how the cloud‑native Higress API gateway adds real‑time web search, routing, security, and observability to create a production‑grade solution in just a few steps.
Why Open‑Source LLMs Matter
With the emergence of high‑quality open‑source large language models such as DeepSeek, the cost of building an in‑house intelligent Q&A system has dropped by more than 90%. Models with 7B‑13B parameters can deliver commercial‑grade responses on ordinary GPU servers.
Higress: A Zero‑Code Swiss‑Army Knife for LLMs
Higress is a cloud‑native API gateway that provides out‑of‑the‑box AI‑enhancement capabilities via WebAssembly plugins. Its core feature matrix includes:
Online Search : Real‑time access to the latest internet information.
Intelligent Routing : Multi‑model load balancing and automatic fallback.
Security : Sensitive‑word filtering and injection‑attack protection.
Performance Optimisation : Request caching and token‑quota management.
Observability : End‑to‑end monitoring and audit logs.
Technical Implementation of Online Search
The AI‑search plugin for Higress is open‑source. It supports multiple search engines and scenarios:
Multi‑engine intelligent routing: public search (Google/Bing/Quark), academic search (Arxiv), private search (Elasticsearch).
Search‑enhancement workflow: LLM rewrites the user query, extracts keywords, identifies domain, splits long queries, and retrieves high‑quality data (full‑text from Alibaba Cloud’s Quark).
Typical Application Scenarios
Examples include financial news Q&A, cutting‑edge technology exploration, and medical question answering, each demonstrated with screenshots of the system’s responses.
Three‑Step Deployment Guide
1. Basic Deployment
curl -sS https://higress.cn/ai-gateway/install.sh | bash python3 -m vllm.entrypoints.openai.api_server \
--model=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
--dtype=half \
--tensor-parallel-size=4 \
--enforce-eager2. Plugin Configuration
After starting the Higress console at http://127.0.0.1:8001, add the following YAML to enable the ai-search plugin:
plugins:
searchFrom:
- type: quark
apiKey: "your-aliyun-ak"
keySecret: "your-aliyun-sk"
serviceName: "aliyun-svc.dns"
servicePort: 443
- type: google
apiKey: "your-google-api-key"
cx: "search-engine-id"
serviceName: "google-svc.dns"
servicePort: 443
- type: bing
apiKey: "bing-key"
serviceName: "bing-svc.dns"
servicePort: 443
- type: arxiv
serviceName: "arxiv-svc.dns"
servicePort: 443
searchRewrite:
llmServiceName: "llm-svc.dns"
llmServicePort: 443
llmApiKey: "your-llm-api-key"
llmUrl: "https://api.example.com/v1/chat/completions"
llmModelName: "deepseek-chat"
timeoutMillisecond: 150003. Connect SDK or Front‑End
Expose the OpenAI‑compatible endpoint at http://127.0.0.1:8080/v1 and use any OpenAI‑compatible client (e.g., ChatBox, LobeChat) to interact with the system.
import json
from openai import OpenAI
client = OpenAI(
api_key="none",
base_url="http://localhost:8080/v1",
)
completion = client.chat.completions.create(
model="deepseek-r1",
messages=[{"role": "user", "content": "分析一下国际金价走势"}],
stream=False,
)
print(completion.choices[0].message.content)Result
Using the Higress + DeepSeek open‑source stack, enterprises can go from zero to a production‑grade intelligent Q&A system within 24 hours, turning LLMs into a real business growth engine.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
