Backend Development 13 min read

Optimizing the Qingzhou Business Gateway: Performance Boosts, FFI Integration, and Routing Enhancements

This article details the architecture of the Qingzhou Business Gateway, identifies its granular control, data‑loss, and performance issues, and explains a series of optimizations—including FFI usage, table‑pool reuse, coroutine caching, radixtree routing, and connection‑pool tuning—that raise single‑node QPS to 80 k while preserving functional capabilities.

TAL Education Technology
TAL Education Technology
TAL Education Technology
Optimizing the Qingzhou Business Gateway: Performance Boosts, FFI Integration, and Routing Enhancements

What is the Qingzhou Business Gateway? It is the entry point for all API services of the Qingzhou student project team, built with OpenResty and Lua, handling traffic, decryption, authentication, anti‑tampering, routing, caching, mock, and documentation.

Current Issues

1. Fine‑grained control : Uses method+path+api_version as a unique control level, allowing per‑API settings for signing, authentication, internal/external access, backend path, etc.

2. Unexplained dict data loss : Data stored in nginx.dict on the master process is accessed by workers via IPC with a lock; occasional loss occurs without a known cause.

3. Poor performance : Extremely fine granularity and regex‑based routing cause high CPU usage; QPS is low.

Optimization Journey

After refactoring, a 4‑core server can reach 80 k QPS (limited by backend services and NIC). The improvements leveraged many components from the open‑source API7/APISIX ecosystem.

FFI Usage

LuaJIT's Foreign Function Interface (FFI) allows direct calls to C functions, dramatically improving performance compared to Lua I/O. Example:

-- Define FFI
ffi.cdef [[
    int uname(struct uts *buf);
]]
local os = ffi.os
if os == "OSX" then
    ffi.cdef [[
        struct uts {
            char os[256];
            char hostname[256];
            char release[256];
            char version[256];
            char machine[256];
            char domain[256];
        };
    ]]
elseif os == "Linux" then
    ffi.cdef [[
        struct uts {
            char os[65];
            char hostname[65];
            char release[65];
            char version[65];
            char machine[65];
            char domain[65];
        };
    ]]
end
_M.os = os
function _M:getSystemInfo()
    local res = {}
    if self.os == "Windows" then
        res["os"] = "Windows"
        res["hostname"] = "unknown"
        res["release"] = "unknown"
        res["version"] = "unknown"
        res["machine"] = "unknown"
        return res
    end
    local uts = ffi.new("struct uts[1]")
    C.uname(uts)
    res["os"] = ffi.string(uts[0].os)
    res["hostname"] = ffi.string(uts[0].hostname)
    res["release"] = ffi.string(uts[0].release)
    res["version"] = ffi.string(uts[0].version)
    res["machine"] = ffi.string(uts[0].machine)
    return res
end

This shows how defining C interfaces with ffi.cdef enables direct, non‑blocking system calls.

Table Reuse

Lua tables are expensive to create repeatedly; OpenResty provides a tablepool module to recycle them:

access_by_lua_block {
  local tablepool = require "tablepool"
  ngx.ctx.api_ctx = tablepool.fetch("ngx_ctx", 0, 10)
}
log_by_lua_block {
  local tablepool = require "tablepool"
  tablepool.release("ngx_ctx", ngx.ctx.api_ctx)
}

Tables fetched in the access phase are released in the log phase, preventing data contamination.

Coroutine‑level Cache

Headers are frequently accessed via ngx.req.get_headers() , which internally uses FFI but still incurs overhead. A lightweight cache wrapper stores headers in the request context:

local req_header = ngx.req.get_headers
local function get_header(ctx, key, default)
    if ctx.header == nil then
        ctx.header = ngx.req.get_headers()
    end
    return ctx.header[key] or default
end

This reduces repeated FFI calls.

Routing Optimization: Traversal+Regex vs. radixtree

Original routing traversed all routes and applied regex (O(n)). Switching to lua‑resty‑radixtree , which combines a hash table lookup (O(1)) with a radix tree (O(k)), yields 100‑200× speedups in route matching.

Connection Pool

Enabling Nginx's proxy connection pool roughly doubles throughput for proxy scenarios.

From dict to In‑memory Cache

Legacy dict storage caused IPC overhead and occasional loss. The refactor moves data to worker‑level memory, using LRU caches for hot data and worker‑event for update propagation, preparing for a future etcd‑based config center.

ngx.req.set_uri vs. ngx.var

Using ngx.var to rewrite URIs avoids the heavy validation and memory copies performed by ngx.req.set_uri . Example:

# nginx.conf
set $upstream_uri "";
location / {
    proxy_pass http://api_upstream$upstream_uri;
}
-- Lua code
ngx.var.upstream_uri = "/api/v1/user/info"

Similarly, setting headers via ngx.var avoids the overhead of ngx.req.set_header when static values suffice.

Final Thoughts

The Qingzhou Business Gateway optimization demonstrates that systematic profiling, leveraging high‑performance OpenResty components, and careful code‑level tweaks can dramatically improve latency and throughput while maintaining functional richness.

Backendperformance optimizationFFIAPI GatewayRoutingLuaOpenResty
TAL Education Technology
Written by

TAL Education Technology

TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.