Evolution of Zhaozhuan Test Environment Governance: From Physical Isolation to Tag‑Based Traffic Routing
This article details Zhaozhuan's three‑generation test environment governance evolution—starting with physical isolation, moving to automatic IP‑label traffic routing, and finally manual tag‑based routing—highlighting architectural changes, deployment processes, advantages, drawbacks, and supporting tools such as distributed tracing and debugging utilities.
The article begins by describing the background and requirements of Zhaozhuan's testing environment, explaining how monolithic architectures evolved into micro‑services, making precise request routing to specific test nodes increasingly complex.
Traditional solutions relied on physical isolation, provisioning a completely separate environment for each test need, which worked for a small number of services but caused severe resource waste as the service count grew.
Version 1 (V1) introduced an improved physical isolation approach: a stable environment containing all services and dynamic environments (single KVM VMs) that host only the services under test. Host file mappings such as 192.168.1.1 A.zhuaninc.com and 127.0.0.1 A.zhuaninc.com are used to direct traffic, and MQ topics are prefixed with IP tags to keep messages isolated.
V1’s pros include strong isolation and simple link topology, while cons involve unnecessary deployment of unchanged services, low deployment efficiency, complex host management, and limited memory on a single machine.
Version 2 (V2) replaces manual host mapping with an automatic IP‑label traffic routing mechanism. By automatically tagging services with their VM IP, the environment setup time drops from hours/days to 30 minutes–1 hour, and the number of services per environment falls from 30‑60 to single‑digit counts.
Version 3 (V3) further refines routing by moving to Docker containers and manual label routing. Services receive JVM parameters like -Dtag=xxx and HTTP headers tag=xxx to indicate the target environment. This enables fine‑grained deployment of only the modified services, reducing average services per environment to 3‑4 and setup time to 2‑5 minutes.
For RPC calls, service registration includes the tag (e.g., yyy or xxx ), allowing callers to select the appropriate instance based on the tag. If no matching dynamic instance exists, the stable instance is used.
Message queue routing uses tag‑prefixed consumer groups (e.g., ${tag} for dynamic environments and test_ for stable ones) to achieve logical isolation while sharing the same topic.
Tag propagation inside processes is handled via Alibaba's TransmittableThreadLocal agent, which transparently carries tags across threads and thread pools without code changes.
Supporting tools include wildcard domain resolution (e.g., app-${tag}.test.zhuanzhuan.com ), a web‑shell for container access, and a debug plugin that automatically retrieves the correct IP and debug port based on the environment tag.
A distributed tracing system (Radar + Zipkin) records TraceId and SpanId at entry and exit points, stores them in Kafka, and visualizes call chains, helping to pinpoint routing issues such as a missing call from service D to E'.
In summary, the three versions—physical isolation, automatic IP‑label routing, and manual tag routing—progressively reduce resource consumption, deployment time, and operational overhead while introducing new challenges like increased link complexity and IP volatility, mitigated by the auxiliary facilities described.
转转QA
In the era of knowledge sharing, discover 转转QA from a new perspective.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.