Cloud Native 8 min read

Build a Cloud‑Native Playground to Compare GPT‑4o and Qwen‑2.5 with NextChat and Higress

This article walks through setting up a cloud‑native test environment using the open‑source NextChat UI and Higress API gateway to let Qwen‑2.5 masquerade as GPT‑4o, enabling a side‑by‑side comparison of their responses while showcasing Higress’s streaming, hot‑update, and security features for AI workloads.

Alibaba Cloud Native

May 15, 2024

Build a Cloud‑Native Playground to Compare GPT‑4o and Qwen‑2.5 with NextChat and Higress

Introduction

OpenAI released GPT‑4o and Alibaba released Qwen‑2.5 (Tongyi Qianwen). The article demonstrates a test setup where Qwen‑2.5 is accessed through an OpenAI‑compatible API and compared side‑by‑side with GPT‑4o.

Test Scenario

Two open‑source components are used:

NextChat – a self‑hosted ChatGPT‑like web UI.

Higress – Alibaba’s cloud‑native API gateway that rewrites Qwen‑2.5’s native API to the OpenAI format.

NextChat provides the front‑end, while Higress’s AI‑Proxy plugin maps the requested gpt‑4o model name to Qwen‑max, allowing the UI to call Qwen‑2.5 transparently.

Deployment Steps

Clone the Higress repository and start the Docker Compose stack. Replace the environment variable YOUR_DASHSCOPE_API_KEY with your own DashScope API key for Qwen‑2.5. docker compose -p higress-ai up -d Open a browser at http://localhost:3000/ to load the NextChat UI.

In the chat toolbar click the model‑switch button. The AI‑Proxy plugin maps gpt‑4o to qwen‑max, so the model actually invoked is Qwen‑2.5.

Higress AI Gateway Overview

Higress is built on Envoy and supports hot configuration updates without breaking long‑lived connections, CC‑attack protection, and low‑overhead streaming. These capabilities address typical AI‑Web traffic characteristics: long‑lived WebSocket/SSE connections, high latency inference, and large bandwidth usage.

Key Code Example

// Streaming response handler for the AI‑Proxy plugin
func onStreamingResponseBody(ctx wrapper.HttpContext, pluginConfig config.PluginConfig, chunk []byte, isLastChunk bool, log wrapper.Log) []byte {
    activeProvider := pluginConfig.GetProvider()
    if activeProvider == nil {
        log.Debugf("[onStreamingResponseBody] no active provider, skip processing")
        return chunk
    }
    log.Debugf("[onStreamingResponseBody] provider=%s", activeProvider.GetProviderType())
    log.Debugf("isLastChunk=%v chunk: %s", isLastChunk, string(chunk))
    if handler, ok := activeProvider.(provider.StreamingResponseBodyHandler); ok {
        apiName := ctx.GetContext(ctxKeyApiName).(provider.ApiName)
        modifiedChunk, err := handler.OnStreamingResponseBody(ctx, apiName, chunk, isLastChunk, log)
        if err == nil && modifiedChunk != nil {
            return modifiedChunk
        }
        return chunk
    }
    return chunk
}

Repository and Issue Links

AI‑Proxy plugin source:

https://github.com/alibaba/higress/tree/main/plugins/wasm-go/extensions/ai-proxy

Open issues for contribution:

https://github.com/alibaba/higress/issues/940

Conclusion

The setup enables direct comparison of GPT‑4o and Qwen‑2.5 responses and illustrates how a cloud‑native API gateway can handle streaming LLM traffic with minimal memory overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker LLM Qwen GPT-4o Higress AI Gateway NextChat

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.