How We Cut Server Costs by 82%: Refactoring a High‑Concurrency QQ Game Service from C++ to Go with Kafka
This article details the redesign of a core QQ game achievement service that suffered from low resource utilization and heavy CAS contention, describing how moving from a synchronous C++ implementation to an asynchronous Go‑Kafka pipeline eliminated lock conflicts, reduced server count by 82%, and dramatically improved latency and stability.
Background
The achievement service is a core infrastructure of the QQ game ecosystem, handling real‑time user achievement display and game‑rights settlement with extremely high write concurrency. The original C++ implementation used a synchronous direct‑write model that behaved like a bank teller: each request blocked while acquiring a version number via CAS, leading to severe contention during large‑scale game events.
Pain Points of the Original Design
Low resource utilization : many threads blocked on I/O, forcing the team to over‑provision servers.
CAS collisions : multiple threads frequently contested the same version number, causing retries and wasted CPU cycles.
Architecture Evolution: From “Chaos” to a Structured Pipeline
The refactor introduced Kafka as a durable, partitioned queue. User UIN became the partition key, guaranteeing that all actions of a single user land in the same partition. A Go service now consumes each partition sequentially, turning the previously parallel write pattern into a serialized one at the physical layer.
Code Refactor: Reducing Complexity and Eliminating CAS
In the C++ version, ProcessData tightly coupled business logic with storage I/O, and the time window between Get and Commit allowed other threads to modify data, causing frequent QuickCommit failures.
After migration, the Go implementation follows a clear three‑step flow:
func (a *Achieve) ProcessAchieveData(...) error {
// 1. Get data (CAS version is almost always latest because of serialization)
cas, memAchieveInfo, err := a.GetAchieveData(ctx, cmemKey)
// 2. Compute in memory (same logic as C++ but without lock worries)
// ...
// 3. Persist (CAS failure is now rare)
if changed {
a.Proxy.Set(ctx, cmemKey, memAchieveInfo, ..., WithSetCas(cas))
}
return nil
}Key improvements:
CAS contention dropped from double‑digit percentages to near zero.
Code size reduced by ~40%.
Handling Large Achievement Payloads
When a user’s achievement data becomes too large, it is split into sub‑keys. The Go version uses native slices and sort.Slice for ordering, replacing the C++ multimap iterator pattern that caused high memory usage and iterator invalidation risks.
sort.Slice(playerDataList, func(i, j int) bool {
return playerDataList[i].data.LLastModifyTime > playerDataList[j].data.LLastModifyTime
})Data Merging Strategy
Reports are grouped by player (platform+zone+server+role ID) and sorted by Kafka timestamp. For each type, only the latest entry is kept, turning many potential CAS conflicts into a single merge operation.
type itemWithTS struct {
item achieve.SAchieveReportItem
ts int64 // Kafka message timestamp
}
func (a *AchievesLogic) ReportAchieveData(ctx context.Context, aMsgs []*model.AchieveMsg) {
playerItems := map[common.SGCAchievePlayer][]*itemWithTS{}
// 1. Group by player
for _, aMsg := range aMsgs {
player, items := parseReportData(aMsg)
playerItems[player] = append(playerItems[player], items...)
}
// 2. Sort each group by timestamp descending
for _, items := range playerItems {
sort.Slice(items, func(i, j int) bool { return items[i].ts > items[j].ts })
}
// 3. Deduplicate by type, keep latest
// ...
}Performance and Cost Results
After the migration:
Server count reduced by 82% (from dozens of 4‑core 8 GB machines to a fraction of that).
CPU core usage dropped by 73%.
Memory consumption fell by 91%.
CAS write error rate eliminated (99.9% reduction).
Average request latency decreased by 40% because retry overhead vanished.
The Go service’s goroutine model dramatically cut context‑switch overhead, and Kafka’s peak‑shaving capability allowed the same business load to be supported with only 26% of the original compute power.
Monitoring and Alerting Improvements
With the new pipeline, false alerts caused by CAS conflicts disappeared, allowing the alert system to focus on genuine storage or business errors. Integration with the Galilean monitoring suite added fine‑grained metrics for Kafka consumer lag, goroutine count, and memory allocation, giving deeper insight into service health.
Conclusion
The refactor demonstrates that replacing a lock‑heavy synchronous architecture with an asynchronous, partitioned queue can turn “architectural space” into “computational time”, yielding massive cost savings, higher stability, and a cleaner codebase for future scaling.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
