Design and Engineering Practices of a Billion‑Scale Node.js Gateway
Wang Weijia’s talk outlines the architecture and engineering of Tencent CloudBase’s billion‑scale Node.js gateway—built with Nest.js, layered controllers and services, async streaming, keep‑alive connections, a two‑level cache with refresh‑ahead, and HA measures like horizontal scaling, rate limiting, multi‑AZ deployment, and disaster‑recovery caching—delivering 99.98% cache hits, 14 ms median latency, and proving Node.js can power latency‑sensitive services while encouraging front‑end engineers to adopt backend practices.
This article is a transcript of Wang Weijia’s talk at the GMTC Global Front‑End Technology Conference (Shenzhen, 2021), where he presented the architecture design and engineering practice of a billion‑scale Node.js gateway used in Tencent CloudBase.
The gateway serves as a public‑cloud entry point, routing external HTTP requests to various backend resources such as cloud functions, container services, and static hosting. Its core responsibilities include public entry handling, backend resource integration, and identity authentication.
A minimal HTTP gateway can be implemented in just a few lines of code:
import express from 'express';
import { requestUpstream, resolveUpstream } from './upstream';
const app = express();
app.all('*', async (req, res) => {
console.log(req.method, req.path, req.headers);
const upstream = await resolveUpstream(req.method, req.path, req.headers);
const response = await requestUpstream(upstream, req.body);
console.log(response.statusCode, response.headers);
res.send(response);
});
const port = 3000;
app.listen(port, () => {
console.log(`App listening at ${port}`);
});This snippet shows the basic flow: receive a request, resolve the upstream service, forward the request, and return the response.
The full system is built on Nest.js for its IOC container and design‑pattern support. The internal architecture is split into two layers: Controllers (handling different resource types) and Services (containing business logic, cluster management, and I/O clients). Services are further divided into logical modules, functional modules, and auxiliary components such as logging, configuration, and DNS management.
Performance optimization focuses on reducing I/O latency and shortening the core path. Key techniques include:
Asynchronous processing of non‑blocking tasks.
Streaming request bodies directly to upstream services without parsing them.
Using long‑living TCP connections (keep‑alive) to avoid the overhead of short‑lived connections.
Designing a cache layer that stores frequently accessed routing and configuration data, with a two‑level cache (local memory + Redis) and a refresh‑ahead strategy to minimize cache misses.
Cache design leverages the fact that only a small fraction of data is hot, request patterns are relatively stable, and real‑time freshness is not critical. The system employs TTL + LRU eviction and a refresh‑ahead mechanism that proactively updates near‑expiry entries.
High‑availability measures cover both the core service and the surrounding infrastructure:
Horizontal scaling of containerized gateway instances to handle sudden traffic spikes.
Rate limiting per instance to prevent overload.
Database read‑through cache to avoid cache‑thundering.
Graceful service degradation (e.g., skipping authentication when the auth service fails).
Multi‑AZ and multi‑region deployment with SET‑style (stateless) services, enabling traffic shift to any remaining healthy zone.
Disaster‑recovery cache that retains stale data for fallback when backend services are unavailable.
Comprehensive monitoring via Elasticsearch logs and an alarm system that detects abnormal traffic patterns, error spikes, and latency regressions.
Overall, the gateway achieves a 99.98% cache hit rate, processes 99% of requests within 14 ms, and demonstrates that Node.js can reliably power large‑scale, latency‑sensitive services when combined with proper architectural patterns, caching, and operational practices.
The talk concludes with three takeaways: (1) Node.js services follow the same backend engineering principles as other languages; (2) Node.js is capable of handling massive workloads without being a “toy” language; (3) Front‑end engineers should view themselves at the crossroads of many technologies, not limited to traditional web development.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.