Zhihu’s Migration from Python to Go: Architecture, Process, and Lessons Learned
Zhihu’s backend team migrated high‑traffic services from Python to Go, detailing the motivations, step‑by‑step migration process, performance gains, project structure, testing practices, static analysis, and operational lessons such as graceful traffic shifting and service deprecation.
Background
Zhihu’s community backend was primarily written in Python. Rapid user growth and increasing business complexity caused CPU and resource pressure, exposing Python’s lower runtime efficiency and higher maintenance cost.
Why Golang
Golang was chosen for its native concurrency, mature internal ecosystem, static typing, single‑binary deployment, and comparable learning curve, offering advantages over Java and Python for high‑concurrency services.
Transformation Results
Core services (member RPC, comments, Q&A) have been fully rewritten in Go, saving over 80% of server resources, reducing development and maintenance costs, and improving internal Go component libraries.
Implementation Process
Because Zhihu’s micro‑service architecture isolates resources, the migration proceeded in five steps: (1) rewrite logic in a new Go service while keeping the external protocol unchanged and sharing the original service’s resources; (2) verify correctness via side‑by‑side request comparison for read APIs and unit/QA testing for write APIs; (3) gray‑scale traffic by proxying requests to the new service; (4) switch traffic entry once 100% traffic is handled by Go; (5) decommission the old Python service.
Go Project Practices
Key practices include a clear project layout (bin, cmd, gen‑go, pkg, thrift_files, vendor), interface‑based layering with impl and mock packages to enable testing, early static analysis using gometalinter (vet, golint, errcheck), feature‑level degradation with circuit , panic‑recover middleware for error handling, careful goroutine management, and ensuring resp.Body.Close() to avoid leaks.
Conclusion
The migration, completed in Q2/Q3 2018 by the community architecture and content technology teams, demonstrates how a large‑scale Python service can be refactored to Go, achieving significant resource savings, higher reliability, and a reusable framework for future projects.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.