Evolution of Rainbow Bridge Architecture: Building a Self‑Managed Metadata Center and SDK Enhancements
The new Rainbow Bridge architecture replaces the SLB‑based load‑balancing model with a self‑managed, multi‑AZ metadata center and enhanced SDK that aggregates node health, provides zone‑aware weighted routing, supports rapid failover and manual overrides, and delivers faster recovery and scalable traffic handling.
Rainbow Bridge is a distributed middleware platform that previously relied on SLB for load balancing and node discovery. As traffic grew, SLB bandwidth limits and lack of zone‑aware routing became bottlenecks.
The new architecture removes the hard dependency on SLB by introducing a self‑built metadata center and enhancing the SDK. The metadata center consists of a multi‑AZ Metadata database and a MetaCenter service that aggregates node health information and provides an API for the SDK.
Key components:
Metadata database stores node_info (beat_version, config_version, enabled) and is deployed in multiple availability zones.
MetaCenter periodically queries all metadata databases, merges node lists, and keeps a healthy node pool in memory.
SDK fetches the node list from MetaCenter via a 7‑layer SLB, caches it, and refreshes every 5 seconds or on change events.
Node lifecycle:
On startup a node registers itself in every metadata database; if the record does not exist it is inserted, otherwise its weight is reset.
During operation the node updates its beat_version periodically.
When a node is taken offline, an admin API updates the enabled flag.
update node_info set weight = 1, config_version = #{config_version} where cluster_name = ? and address = ? update node_info set beat_version = beat_version + 1 where cluster_name = ? and address = ? update node_info set enabled = 0, config_version = #{config_version} where ip = ? and port = ?MetaCenter (also called Heimdall) merges node information from all databases, removes stale nodes based on heartbeat loss thresholds, and exposes an OpenAPI for the SDK. It also supports manual override of the node list.
The SDK performs weighted round‑robin within the same AZ first, falling back to cross‑AZ routing when AZ‑specific weight ratios fall below configurable thresholds (X%, Y%, Z). It also provides a switch to revert to the legacy 4‑layer SLB architecture.
Management UI adds pages for node management, fallback node configuration, and one‑click enable/disable APIs for deployment pipelines.
Disaster‑recovery tests show that a full AZ outage can be recovered within ~30 seconds, and metadata database failures in a single zone have no impact on SDK‑Proxy communication.
Overall, the self‑built metadata center eliminates SLB bandwidth constraints, simplifies scaling, enables automatic failover, and improves zone‑aware routing for the Rainbow Bridge service.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.