How ByteDance Scaled Over 700k QPS with Go: Lessons in Backend Microservices

This article details ByteDance's migration to Go for its backend, covering the reasons for choosing Go, the design of a five‑tuple microservice architecture, concurrency models, timeout handling, performance tuning, monitoring, and engineering practices that enabled a production system handling over 300 billion daily requests.

21CTO
21CTO
21CTO
How ByteDance Scaled Over 700k QPS with Go: Lessons in Backend Microservices

Go Microservice Journey

More than 80% of today's Toutiao backend traffic runs on services built with Go. Over 100 microservices handle peak QPS exceeding 7 million and process more than 300 billion requests daily, making it one of the largest Go deployments in the industry.

Why Choose Go?

Simple syntax and quick onboarding.

High performance, fast compilation, and good development efficiency.

Native concurrency with an excellent coroutine model suitable for network calls.

Easy deployment, small compiled binaries, and almost no dependencies.

Since Go 1.4 was released, the team had already experimented with Go 1.1 and built large‑scale services, giving confidence in its stability. Combined with a shift from a monolithic architecture to service‑oriented design, Go became the language of choice for Toutiao's backend microservices.

Microservice Architecture

The team abstracted service calls into a five‑tuple concept (From, FromCluster, To, ToCluster, Method). Each unique tuple defines a class of RPC calls, forming the basis of the microservice framework kite , which is fully compatible with Thrift and provides service registration, discovery, load balancing, timeout, circuit‑breaker, degradation, method‑level metrics, and distributed tracing.

Concurrency

Go’s native support for concurrency uses lightweight user‑space goroutines, allowing tens of thousands of concurrent tasks. Each request is processed by an independent goroutine, making the model intuitive and easier to maintain for large projects.

Concurrency Model

Go implements the CSP model: communication over channels replaces shared‑memory communication. The classic prime‑sieve example demonstrates how multiple goroutines filter numbers concurrently, illustrating CSP in practice.

Concurrency Control

Two patterns appear frequently: Wait – the main goroutine waits for all spawned RPC calls to finish; and Cancel – if the overall request timeout expires, remaining RPC calls are cancelled. Go provides sync.WaitGroup for the Wait pattern and context.Context for cancellation.

Timeout Control

Proper timeout settings prevent cascading failures. An RPC call consists of three timeout phases: connection, write, and read. The kite framework adds a “Concurrent Ctrl” module that limits the number of concurrent requests and enforces per‑call deadlines using timers and context.

Performance

Go offers built‑in profiling tools (CPU, memory, goroutine stack, GC logs, trace). Best‑practice tips include minimizing lock scope, using CAS, optimizing hot paths, tuning GC, reusing objects via sync.Pool, avoiding reflection, and upgrading to newer Go versions.

A real‑world case study shows a storage service reduced its P99 latency from 100 ms to 15 ms by reducing memory allocations, switching from Thrift to Msgpack, streaming data with io.Reader, and parallelizing key reads.

Service Monitoring

The runtime package exposes metrics such as goroutine count, GC pause time, and heap usage. kite collects these metrics per service and sets alert thresholds for critical values, also providing snapshots of stack traces for post‑mortem analysis.

Programming Mindset and Engineering

Go forces a different mindset: each service runs in a single process, panics terminate the process, and there is no thread‑local storage, so context must be passed explicitly. The language’s simplicity and built‑in AST tools improve code manageability for large projects.

Conclusion

Toutiao’s large‑scale microservice platform built with Go demonstrates superior concurrency, timeout handling, and performance. The services run efficiently in containers and are moving toward a Cloud‑Native architecture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backendperformancecloud-nativeMicroservicesGo
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.