Cloud Native 13 min read

How Higress Cuts AI Large‑Model Costs with Unified API Management

This guide explains how the cloud‑native Higress gateway can lower the expense of using AI large models by centralising API keys, applying rate‑limiting, request filtering, and usage monitoring, and demonstrates a complete OpenAI integration using a WASM plugin with code samples and configuration details.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How Higress Cuts AI Large‑Model Costs with Unified API Management

Background

AIGC technologies such as ChatGPT are reshaping enterprise workflows, but the pay‑per‑token pricing model of large AI models (e.g., OpenAI) can quickly become costly for organizations that need to manage many users and diverse model selections.

Cost challenges of AI large models

OpenAI charges per token, and different models have different token rates; more capable models generate higher costs. Managing individual API keys for each employee is impractical, making it difficult to track usage, enforce policies, and control expenses.

Higress capabilities for cost reduction

Higress, a cloud‑native gateway, offers authentication, request filtering, traffic control, usage monitoring, and security features that enable organisations to:

Use a single API key for unified usage accounting and billing.

Apply per‑model or per‑user rate limits to curb excessive calls.

Block requests containing sensitive data before they reach the AI service.

Leverage built‑in metrics and logs (commercial edition) to analyse consumption and optimise model selection.

Practical example: OpenAI integration via WASM plugin

The following example shows how to build an AI‑proxy plugin for OpenAI using Higress and a Go‑based WASM extension.

Step 1: Install Higress

Follow the official Higress installation guide.

Step 2: Prepare Go WASM development environment

Set up Go and the Higress WASM SDK as described in the "Develop WASM plugins with Go" documentation.

Plugin implementation

The plugin parses configuration, creates an HTTP client that forwards requests to api.openai.com, formats the OpenAI request body, forwards the response, and returns it to the caller.

func parseConfig(json gjson.Result, config *MyConfig, log wrapper.Log) error {
    chatgptUri := json.Get("chatgptUri").String()
    var chatgptHost string
    if chatgptUri == "" {
        config.ChatgptPath = "/v1/completions"
        chatgptHost = "api.openai.com"
    }
    // ...
    config.client = wrapper.NewClusterClient(wrapper.RouteCluster{Host: chatgptHost})
    // ...
}

const bodyTemplate string = `{
    "model":"%s",
    "prompt":"%s",
    "temperature":0.9,
    "max_tokens":150,
    "top_p":1,
    "frequency_penalty":0.0,
    "presence_penalty":0.6,
    "stop":["%s","%s"]
}`

func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig, log wrapper.Log) types.Action {
    // Build request body
    body := fmt.Sprintf(bodyTemplate, config.Model, prompt[0], config.HumainId, config.AIId)
    // Forward request
    err = config.client.Post(config.ChatgptPath, [][2]string{{"Content-Type", "application/json"}, {"Authorization", "Bearer " + config.ApiKey}}, []byte(body), func(statusCode int, responseHeaders http.Header, responseBody []byte) {
        var headers [][2]string
        for k, v := range responseHeaders {
            headers = append(headers, [2]string{k, v[0]})
        }
        proxywasm.SendHttpResponse(uint32(statusCode), headers, responseBody, -1)
    }, 10000)
    // ...
}

Plugin configuration

The plugin can be applied globally, per domain, or per route. A typical route‑level configuration looks like:

apiKey: "xxxxxxxxxxxxxxxxxx"
model: "curie"
promptParam: "text"

After deploying the plugin, a request such as curl "http://{GatewayIP}/?text=Say,hello" is proxied to the OpenAI curie model and returns the model’s response.

Multi‑tenant authentication with Key Auth

Instead of issuing separate API keys to each user, Higress’s Key Auth plugin lets organisations manage access through internal credentials. Only consumers listed in the configuration can invoke the AI service.

# Global Key Auth configuration
consumers:
  - credential: "xxxxxx"
    name: "consumer1"
  - credential: "yyyyyy"
    name: "consumer2"
global_auth: false
in_header: true
keys:
  - "apikey"
# Route‑level allow list
allow: [consumer1]
curl "http://{GatewayIP}/?text=Say,hello" -H "apikey:xxxxxx"   # succeeds
curl "http://{GatewayIP}/?text=Say,hello" -H "apikey:zzzzzz"   # 401 Unauthorized

Request filtering with Request Block

The Request Block plugin can block requests that contain sensitive keywords (e.g., passwords) before they reach the AI service.

blocked_code: 404
block_urls:
  - password
  - pw
case_sensitive: false
curl "http://{GatewayIP}/?text=Mypassword=xxxx" -H "apikey:xxxxxx"   # 404 blocked

Usage monitoring in commercial Higress

The commercial edition integrates with metric and log systems, providing out‑of‑the‑box dashboards that show per‑user, per‑model consumption (e.g., pie charts for OpenAI‑Curie usage by different consumers). This visibility helps organisations optimise model selection and control costs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeAIWasmapi-gatewayOpenAIHigress
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.