Cloud Native 22 min read

How Youzan Implements Traffic Control, Gray and Blue‑Green Deployments with Istio

This article details Youzan's design and implementation of a traffic‑control system built on Istio/Envoy, describing the protocols, architecture, and concrete JSON routing rules for gray releases and blue‑green deployments, along with observability features and future multi‑service release plans.

Youzan Coder
Youzan Coder
Youzan Coder
How Youzan Implements Traffic Control, Gray and Blue‑Green Deployments with Istio

Background

With rapid growth of Youzan users and services, developers face increasing pressure to provide stable services while iterating quickly. As micro‑service interfaces and call‑chain lengths expand, regression testing becomes hard, and testing‑only validation can no longer guarantee stability.

To balance stability and speed, Youzan introduced a new‑version gray‑release strategy: only a few instances of the new version are deployed, traffic is gradually shifted, and full rollout occurs after verification.

Traffic Control System

Protocol Selection

The team evaluated several goals: a complete protocol supporting service‑mesh features (circuit‑break, rate‑limit, A/B testing), readability, and reuse of mature industry designs. They chose the Istio service‑mesh framework, which uses Envoy as the data plane and supports a JSON‑encoded routing protocol (Envoy v1 API, migrating to v2 gRPC API).

{
  "name": "java-demo-rule",
  "domains": ["java-demo"],
  "routes": [{
    "headers": [{"name": "userid", "value_match": "123"}],
    "cluster": "java-demo|version=v2"
  }, {
    "weighted_clusters": {
      "clusters": [
        {"name": "java-demo|version=v2", "weight": 10},
        {"name": "java-demo|version=v1", "weight": 90}
      ]
    }
  }]
}

This rule routes requests whose header userid equals 123 to the v2 instance; all other traffic is split 10 % to v2 and 90 % to v1.

Architecture

The traffic‑control ecosystem consists of an HTTP gateway (Nginx), a service‑mesh sidecar proxy (Tether), the Dubbo RPC framework, Istio Pilot for rule distribution, and the Ops management system that translates product‑level controls into low‑level routing rules stored as CRDs in Kubernetes.

Traffic control architecture diagram
Traffic control architecture diagram

Gray Release

What Is Gray Release

Gray release deploys a small “canary” cluster alongside the stable cluster and routes a fraction of traffic to it for pre‑production validation. If issues appear, traffic is instantly switched back; otherwise the new version is fully rolled out.

Gray release flow diagram
Gray release flow diagram

Release Process

Start: User selects “Gray Release” in Ops.

Initialize: Deploy canary instances (10 % of stable capacity) with label canary=true. No traffic is sent until a rule is created.

Validate: Push routing rules. Two rule types are supported:

Shop‑list rule – routes requests from specific shop IDs to the canary.

Percentage rule – routes a configurable percentage (max 10 %) of traffic.

Cancel: If validation fails, delete the rule and take down the canary instantly.

Full Rollout: If validation succeeds, promote the new version to all instances and remove the canary.

End.

{
  "name": "java-demo-rule",
  "domains": ["java-demo"],
  "routes": [{
    "headers": [{"name": "shopid", "list_match": ["123", "456"]}],
    "cluster": "java-demo|canary=true"
  }, {
    "cluster": "java-demo|canary=false"
  }]
}

Blue‑Green Release

What Is Blue‑Green Release

Blue‑green release creates a full‑size new cluster (green) in parallel with the existing stable cluster (blue). Traffic is gradually shifted to the green cluster; if problems arise, traffic can be switched back instantly, enabling rapid rollback.

Blue‑green release diagram
Blue‑green release diagram

Why Blue‑Green Is Needed

Full‑traffic rollback is faster than incremental rollbacks of gray releases.

Blue‑green can handle sudden traffic spikes because both clusters have full capacity.

It exposes issues that only appear under 100 % load (e.g., database deadlocks).

After successful verification, the new cluster becomes the stable one without further changes.

Release Process

Start: User selects “Blue‑Green Release” in Ops.

Initialize: Deploy the green cluster with label BlueGreenVersion=green while routing all traffic to the blue cluster.

Validate: Push routing rules (shop‑list or percentage) to shift part or all traffic to the green cluster.

Cancel: If validation fails, route all traffic back to blue and take down green.

Complete: When all traffic runs on green, decommission the blue cluster.

End.

{
  "name": "java-demo-rule",
  "domains": ["java-demo"],
  "routes": [{
    "headers": [{"name": "shopid", "list_match": ["123", "456"]}],
    "cluster": "java-demo|BlueGreenVersion=green"
  }, {
    "cluster": "java-demo|BlueGreenVersion=blue"
  }]
}

Observability & Operability

Beyond routing, Youzan built monitoring and alerting for release processes: real‑time QPS, latency, and error‑rate dashboards for both old and new clusters; event notifications via enterprise IM for key milestones; a global release status view; and periodic statistical reports (weekly, monthly, quarterly).

Metrics dashboard
Metrics dashboard

Future Plans

Upcoming work includes coordinated multi‑application releases where a single rule controls traffic across several services, and extending traffic control to message‑queue consumption paths, which currently lack fine‑grained routing.

Future roadmap diagram
Future roadmap diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Kubernetesgray releaseBlue‑Green deploymenttraffic controlIstioService Mesh
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.