Baidu Zhidao Cloud Migration Practice: From Legacy OXP to Cloud-Native Architecture
Baidu Zhidao migrated its 18‑year‑old Q&A platform from a legacy OXP architecture to a cloud‑native solution using Pandora containers and the Zhiyun platform, overcoming complex code, high traffic, and zero‑downtime requirements, and achieved full traffic migration, 99.99% SLA, reduced latency, and enhanced elasticity and multi‑region disaster recovery.
This article details Baidu Zhidao's comprehensive cloud migration journey, addressing the challenges of migrating an 18-year-old Q&A platform to cloud-native infrastructure. The migration was driven by the deprecation of Baidu's internal PaaS platform (ORP) and the need for improved resource elasticity, cost optimization, and enhanced stability.
Key Challenges:
1. Legacy system complexity: Multiple business scenarios, outdated architecture, inconsistent code styles, and high refactoring costs
2. Rapid business iteration: 780+ business requirements annually while maintaining zero-downtime migration
3. High traffic and stability requirements: Over 100 million daily PV with 99.99% SLA targets
4. Architecture evolution: Achieving multi-region disaster recovery capabilities
Migration Solution:
The team selected Pandora (Baidu's container platform) as the underlying infrastructure and Zhiyun platform for deployment and operations. This choice was based on Pandora's ability to support large-scale module deployments (up to 2K modules) without requiring significant code restructuring, making it suitable for the existing ODP monolithic architecture.
Traffic Switching Implementation:
The migration utilized Lua scripts in the access layer for traffic splitting, implementing various strategies from 1% to 100% traffic migration. The solution included business layer modifications to capture traffic markers and enable A/B testing between cloud and legacy environments.
Architecture Evolution Results:
By March 2023, 100% of Zhidao's traffic was successfully migrated to cloud. The platform achieved three consecutive quarters of 99.99% SLA with zero migration-related issues. Core pages now operate across three regions with four data centers (North China:Central China:South China = 4:3:3), providing N+1 cross-region disaster recovery capabilities. End-to-end latency for mini-program core interfaces decreased by 12%.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.