Cloud Native 6 min read

Build a Real‑Time AI Search‑Enabled Q&A System with Higress and DeepSeek

This guide shows how open‑source LLMs like DeepSeek can power cost‑effective intelligent Q&A services, and how the cloud‑native Higress API gateway adds real‑time web search, routing, security, and observability to create a production‑grade solution in just a few steps.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Build a Real‑Time AI Search‑Enabled Q&A System with Higress and DeepSeek

Why Open‑Source LLMs Matter

With the emergence of high‑quality open‑source large language models such as DeepSeek, the cost of building an in‑house intelligent Q&A system has dropped by more than 90%. Models with 7B‑13B parameters can deliver commercial‑grade responses on ordinary GPU servers.

Higress: A Zero‑Code Swiss‑Army Knife for LLMs

Higress is a cloud‑native API gateway that provides out‑of‑the‑box AI‑enhancement capabilities via WebAssembly plugins. Its core feature matrix includes:

Online Search : Real‑time access to the latest internet information.

Intelligent Routing : Multi‑model load balancing and automatic fallback.

Security : Sensitive‑word filtering and injection‑attack protection.

Performance Optimisation : Request caching and token‑quota management.

Observability : End‑to‑end monitoring and audit logs.

Technical Implementation of Online Search

The AI‑search plugin for Higress is open‑source. It supports multiple search engines and scenarios:

Multi‑engine intelligent routing: public search (Google/Bing/Quark), academic search (Arxiv), private search (Elasticsearch).

Search‑enhancement workflow: LLM rewrites the user query, extracts keywords, identifies domain, splits long queries, and retrieves high‑quality data (full‑text from Alibaba Cloud’s Quark).

Typical Application Scenarios

Examples include financial news Q&A, cutting‑edge technology exploration, and medical question answering, each demonstrated with screenshots of the system’s responses.

Three‑Step Deployment Guide

1. Basic Deployment

curl -sS https://higress.cn/ai-gateway/install.sh | bash
python3 -m vllm.entrypoints.openai.api_server \
  --model=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
  --dtype=half \
  --tensor-parallel-size=4 \
  --enforce-eager

2. Plugin Configuration

After starting the Higress console at http://127.0.0.1:8001, add the following YAML to enable the ai-search plugin:

plugins:
  searchFrom:
    - type: quark
      apiKey: "your-aliyun-ak"
      keySecret: "your-aliyun-sk"
      serviceName: "aliyun-svc.dns"
      servicePort: 443
    - type: google
      apiKey: "your-google-api-key"
      cx: "search-engine-id"
      serviceName: "google-svc.dns"
      servicePort: 443
    - type: bing
      apiKey: "bing-key"
      serviceName: "bing-svc.dns"
      servicePort: 443
    - type: arxiv
      serviceName: "arxiv-svc.dns"
      servicePort: 443
  searchRewrite:
    llmServiceName: "llm-svc.dns"
    llmServicePort: 443
    llmApiKey: "your-llm-api-key"
    llmUrl: "https://api.example.com/v1/chat/completions"
    llmModelName: "deepseek-chat"
    timeoutMillisecond: 15000

3. Connect SDK or Front‑End

Expose the OpenAI‑compatible endpoint at http://127.0.0.1:8080/v1 and use any OpenAI‑compatible client (e.g., ChatBox, LobeChat) to interact with the system.

import json
from openai import OpenAI

client = OpenAI(
    api_key="none",
    base_url="http://localhost:8080/v1",
)

completion = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "分析一下国际金价走势"}],
    stream=False,
)
print(completion.choices[0].message.content)

Result

Using the Higress + DeepSeek open‑source stack, enterprises can go from zero to a production‑grade intelligent Q&A system within 24 hours, turning LLMs into a real business growth engine.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMDeepSeekHigressSearch Enhancement
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.