Cloud Native 14 min read

How Baidu Zhidao Achieved Seamless Cloud Migration and Architecture Evolution

This article details Baidu Zhidao's migration from a legacy, high‑traffic monolithic system to a cloud‑native architecture, covering background challenges, solution selection, traffic switching, scaling practices, gateway overhaul, and the resulting stability, cost, and performance benefits.

Architecture & Thinking
Architecture & Thinking
Architecture & Thinking
How Baidu Zhidao Achieved Seamless Cloud Migration and Architecture Evolution

Background and Challenges

Baidu Zhidao, a ten‑year‑old knowledge‑question product line, suffers from legacy debt, fragmented architecture, inconsistent code style, rapid business iteration, massive traffic (over 100 M PV per day) and strict stability requirements (four‑nines). These factors make a full‑scale cloud migration and architectural evolution challenging.

Business Overview

Zhidao generates knowledge‑type content through user questions and answers, aggregates massive Q&A resources, and drives commercial revenue via advertising. The product line has accumulated extensive content ecosystems and brand recognition.

Key challenges:

Legacy architecture and high refactoring cost.

Fast‑changing business requiring seamless migration.

Huge traffic and revenue demanding >99.99% stability.

Need for rational architectural evolution during migration.

Architecture Overview

The overall business architecture and pre‑migration traffic architecture are illustrated in the following diagrams.

Cloud Design and Practice

Cloud Solution Selection

The legacy PaaS platform ORP is discontinued; the new migration targets a cloud‑native PaaS based on Pandora and the Zhiyun platform, providing container elasticity, pay‑as‑you‑go resources, and integrated services.

Why Pandora

Supports major Baidu C‑end services and large‑scale module deployment (up to 2 K modules) without extensive code merging.

Provides necessary capabilities for the large monolithic ODP architecture.

Ease of use complemented by Zhiyun services.

Why Zhiyun Platform

Supports multi‑APP co‑construction, reducing migration cost.

Offers logging, scheduling, access layer, static resources, and customizable services.

Provides ODP runtime environment and container management.

Traffic Switching and Scaling

Before migration, traffic clusters are restructured. Small‑traffic experiments use Lua scripts with strategy tables:

<code>[&#39;strategy_1_1_98&#39;] = {1, 1, 98},
[&#39;strategy_5_5_90&#39;] = {5, 5, 90},
[&#39;strategy_10_10_80&#39;] = {10, 10, 80},
...,
[&#39;strategy_100_0_0&#39;] = {100, 0, 0}
</code>

The script returns one of “opera”, “abtest”, or “orp” to control traffic proportion.

Proxy target is set via:

<code>set $upstream_target "${terminal_target}_${target_cluster}";
</code>

Business layer uses the target flag to select appropriate ad IDs:

<code>if ($_SERVER[&#39;HTTP_X_BD_TARGET&#39;] == &#39;pandora&#39;) {
    $adsEids = array('asp' => array(50001));
} else if ($_SERVER[&#39;HTTP_X_BD_TARGET&#39;] == &#39;abtest&#39;) {
    $adsEids = array('asp' => array(50002));
}
</code>

Gateway Migration

The Janus gateway replaces the legacy ORP gateway, offering finer granularity, reduced rule count (from 2768 lines to 18), and improved safety through staged releases and checks.

Architecture Evolution

Core traffic is redistributed across three regions and four data centers, achieving N+1 redundancy. Non‑core traffic is migrated to dual North‑China data centers with redundancy.

Key steps include full‑chain resource construction, extensive stress testing, and coordinated traffic switching with third‑party services.

Summary and Benefits

All traffic migrated to cloud by 31 Mar 2023.

Three‑quarter‑year SLA maintained at four‑nines, with zero new incidents post‑migration.

Core pages now have three‑region, four‑data‑center deployment, achieving N+1 cross‑region disaster recovery.

End‑to‑end latency for core interfaces reduced by 12 %.

Public IP costs decreased monthly since 2023, and OXP machines retired, saving R&D costs.

Cloud NativePerformance Optimizationcloud migrationarchitecture evolutiontraffic switching
Architecture & Thinking
Written by

Architecture & Thinking

🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.