Build a Cloud‑Native Playground to Compare GPT‑4o and Qwen‑2.5 with NextChat and Higress
This article walks through setting up a cloud‑native test environment using the open‑source NextChat UI and Higress API gateway to let Qwen‑2.5 masquerade as GPT‑4o, enabling a side‑by‑side comparison of their responses while showcasing Higress’s streaming, hot‑update, and security features for AI workloads.
Introduction
OpenAI released GPT‑4o and Alibaba released Qwen‑2.5 (Tongyi Qianwen). The article demonstrates a test setup where Qwen‑2.5 is accessed through an OpenAI‑compatible API and compared side‑by‑side with GPT‑4o.
Test Scenario
Two open‑source components are used:
NextChat – a self‑hosted ChatGPT‑like web UI.
Higress – Alibaba’s cloud‑native API gateway that rewrites Qwen‑2.5’s native API to the OpenAI format.
NextChat provides the front‑end, while Higress’s AI‑Proxy plugin maps the requested gpt‑4o model name to Qwen‑max, allowing the UI to call Qwen‑2.5 transparently.
Deployment Steps
Clone the Higress repository and start the Docker Compose stack. Replace the environment variable YOUR_DASHSCOPE_API_KEY with your own DashScope API key for Qwen‑2.5. docker compose -p higress-ai up -d Open a browser at http://localhost:3000/ to load the NextChat UI.
In the chat toolbar click the model‑switch button. The AI‑Proxy plugin maps gpt‑4o to qwen‑max, so the model actually invoked is Qwen‑2.5.
Higress AI Gateway Overview
Higress is built on Envoy and supports hot configuration updates without breaking long‑lived connections, CC‑attack protection, and low‑overhead streaming. These capabilities address typical AI‑Web traffic characteristics: long‑lived WebSocket/SSE connections, high latency inference, and large bandwidth usage.
Key Code Example
// Streaming response handler for the AI‑Proxy plugin
func onStreamingResponseBody(ctx wrapper.HttpContext, pluginConfig config.PluginConfig, chunk []byte, isLastChunk bool, log wrapper.Log) []byte {
activeProvider := pluginConfig.GetProvider()
if activeProvider == nil {
log.Debugf("[onStreamingResponseBody] no active provider, skip processing")
return chunk
}
log.Debugf("[onStreamingResponseBody] provider=%s", activeProvider.GetProviderType())
log.Debugf("isLastChunk=%v chunk: %s", isLastChunk, string(chunk))
if handler, ok := activeProvider.(provider.StreamingResponseBodyHandler); ok {
apiName := ctx.GetContext(ctxKeyApiName).(provider.ApiName)
modifiedChunk, err := handler.OnStreamingResponseBody(ctx, apiName, chunk, isLastChunk, log)
if err == nil && modifiedChunk != nil {
return modifiedChunk
}
return chunk
}
return chunk
}Repository and Issue Links
AI‑Proxy plugin source:
https://github.com/alibaba/higress/tree/main/plugins/wasm-go/extensions/ai-proxyOpen issues for contribution:
https://github.com/alibaba/higress/issues/940Conclusion
The setup enables direct comparison of GPT‑4o and Qwen‑2.5 responses and illustrates how a cloud‑native API gateway can handle streaming LLM traffic with minimal memory overhead.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
