Cloud Native 16 min read

How to Build an AI Cache WASM Plugin for Higress Gateway

This guide explains how to set up a Higress gateway, compile a WebAssembly AI cache plugin, integrate Redis and DashVector for semantic caching of large‑model requests, and provides complete configuration and code examples for end‑to‑end deployment.

Alibaba Cloud Native

Aug 13, 2024

How to Build an AI Cache WASM Plugin for Higress Gateway

Background

The Higress AI Gateway Challenge requires a WebAssembly (Wasm) plugin that intercepts the four HTTP phases (requestHeader, requestBody, responseHeader, responseBody) and implements a semantic cache for large‑model (OpenAI‑style) queries to reduce token usage.

Gateway Environment Setup

Online enterprise Higress setup

Apply for a free trial and create the required resources: Qwen model endpoint, Redis service, and DashScope service.

Create service entries for the model, Redis, and DashVector.

Configure routing to forward requests to the Qwen model.

Enable the ai-proxy plugin from the Higress plugin marketplace and provide the model API key and token.

Local testing environment

Run Higress together with LobeChat using Docker Compose. Example docker‑compose.yml:

version: '3.9'
networks:
  higress-net:
    external: false
services:
  higress:
    image: registry.cn-hangzhou.aliyuncs.com/ztygw/aio-redis:1.4.1-rc.1
    environment:
      - GATEWAY_COMPONENT_LOG_LEVEL=misc:error,wasm:debug
      - CONFIG_TEMPLATE=ai-proxy
      - DEFAULT_AI_SERVICE=qwen
      - DASHSCOPE_API_KEY=[YOUR_KEY]
    ports:
      - "9080:8080"
      - "9001:8001"
    volumes:
      - ./data:/data
      - ./log:/var/log/higress/
    restart: always
  lobechat:
    image: lobehub/lobe-chat
    environment:
      - CODE=123456ed
      - OPENAI_API_KEY=unused
      - OPENAI_PROXY_URL=http://higress:8080/v1
    ports:
      - "3210:3210"
    restart: always

Compiling and Uploading the AI Cache Plugin

Clone the Higress repository and build the Wasm plugin:

git clone https://github.com/alibaba/higress.git
cd higress/plugins/wasm-go
PLUGIN_NAME=ai-cache EXTRA_TAGS=proxy_wasm_version_0_2_100 make build

Push the resulting plugin image to a container registry and update the local test configuration to reference the new image version.

Text Vector Request and Cache Logic

The plugin processes each query with the following workflow:

Match the query string against keys stored in Redis ( redisSearchHandler). If an exact match is found, return the cached response.

If not matched, call the text‑embedding API to obtain a query_embedding.

Search the embedding in DashVector (ANN search). If a similar key with a distance lower than 0.1 is found, retrieve its result from Redis.

If no suitable match, store the new query_embedding in DashVector for future use.

During the response phase, insert the query and LLM result into Redis for caching.

Key code snippets:

// Redis lookup
func redisSearchHandler(key string, ctx wrapper.HttpContext, config PluginConfig, log wrapper.Log, stream bool, ifUseEmbedding bool) error {
    err := config.redisClient.Get(config.CacheKeyPrefix+key, func(response resp.Value) {
        if err := response.Error(); err == nil && !response.IsNull() {
            log.Warnf("cache hit, key:%s", key)
            handleCacheHit(key, response, stream, ctx, config, log)
        } else {
            log.Warnf("cache miss, key:%s", key)
            if ifUseEmbedding {
                handleCacheMiss(key, err, response, ctx, config, log, key, stream)
            } else {
                proxywasm.ResumeHttpRequest()
            }
        }
    })
    return err
}

// Fetch embeddings
func fetchAndProcessEmbeddings(key string, ctx wrapper.HttpContext, config PluginConfig, log wrapper.Log, queryString string, stream bool) {
    Emb_url, Emb_requestBody, Emb_headers := ConstructTextEmbeddingParameters(&config, log, []string{queryString})
    config.DashVectorInfo.DashScopeClient.Post(Emb_url, Emb_headers, Emb_requestBody,
        func(statusCode int, responseHeaders http.Header, responseBody []byte) {
            if statusCode != 200 {
                ctx.SetContext(QueryEmbeddingKey, nil)
                proxywasm.ResumeHttpRequest()
                return
            }
            processFetchedEmbeddings(key, responseBody, ctx, config, log, stream)
        }, 10000)
}

// Vector search and result handling
func performQueryAndRespond(key string, text_embedding []float64, ctx wrapper.HttpContext, config PluginConfig, log wrapper.Log, stream bool) {
    vector_url, vector_request, vector_headers, err := ConstructEmbeddingQueryParameters(config, text_embedding)
    if err != nil {
        log.Errorf("Failed to perform query: %v", err)
        proxywasm.ResumeHttpRequest()
        return
    }
    config.DashVectorInfo.DashVectorClient.Post(vector_url, vector_headers, vector_request,
        func(statusCode int, responseHeaders http.Header, responseBody []byte) {
            query_resp, err := ParseQueryResponse(responseBody)
            if err != nil || len(query_resp.Output) == 0 {
                uploadQueryEmbedding(ctx, config, log, key, text_embedding)
                return
            }
            most_similar_key := query_resp.Output[0].Fields["query"].(string)
            most_similar_score := query_resp.Output[0].Score
            if most_similar_score < 0.1 {
                redisSearchHandler(most_similar_key, ctx, config, log, stream, false)
            } else {
                uploadQueryEmbedding(ctx, config, log, key, text_embedding)
                proxywasm.ResumeHttpRequest()
            }
        }, 100000)
}

Configuration Files

YAML configuration required by the plugin (replace placeholder values with actual credentials):

Dash:
  dashScopeKey: "YOUR_DASHSCOPE_KEY"
  dashScopeServiceName: "qwen"
  dashVectorCollection: "YOUR_CLUSTER_NAME"
  dashVectorEnd: "YOUR_VECTOR_END"
  dashVectorKey: "YOUR_DASHVECTOR_KEY"
  dashVectorServiceName: "DashVector.dns"
  sessionID: "XXX"
redis:
  serviceName: "redis.static"
  timeout: 2000

External Service Registration

Declare external services in the plugin configuration and instantiate them in ParseConfig:

type PluginConfig struct {
    DashVectorClient wrapper.HttpClient `yaml:"-" json:"-"`
    DashScopeClient  wrapper.HttpClient `yaml:"-" json:"-"`
    redisClient      wrapper.RedisClient `yaml:"-" json:"-"`
}

func ParseConfig(c *Config) {
    c.DashVectorInfo.DashVectorClient = wrapper.NewClusterClient(wrapper.DnsCluster{ServiceName: c.DashVectorInfo.DashVectorServiceName, Port: 443, Domain: c.DashVectorInfo.DashVectorAuthApiEnd})
    c.DashVectorInfo.DashScopeClient = wrapper.NewClusterClient(wrapper.DnsCluster{ServiceName: c.DashVectorInfo.DashScopeServiceName, Port: 443, Domain: "dashscope.aliyuncs.com"})
}

Limitations

The cache logic must run in callbacks that return types.Action. Streaming response handlers (e.g., onHttpResponseBody) cannot block, so additional signalling is required for full asynchronous support.

Repository

Full source code: https://github.com/Suchun-sv/ai-cache-Demo

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

redis vector database WebAssembly Higress AI Cache

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.