How We Cut Server Costs by 82%: Refactoring a High‑Concurrency QQ Game Service from C++ to Go with Kafka

This article details the redesign of a core QQ game achievement service that suffered from low resource utilization and heavy CAS contention, describing how moving from a synchronous C++ implementation to an asynchronous Go‑Kafka pipeline eliminated lock conflicts, reduced server count by 82%, and dramatically improved latency and stability.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
How We Cut Server Costs by 82%: Refactoring a High‑Concurrency QQ Game Service from C++ to Go with Kafka

Background

The achievement service is a core infrastructure of the QQ game ecosystem, handling real‑time user achievement display and game‑rights settlement with extremely high write concurrency. The original C++ implementation used a synchronous direct‑write model that behaved like a bank teller: each request blocked while acquiring a version number via CAS, leading to severe contention during large‑scale game events.

Pain Points of the Original Design

Low resource utilization : many threads blocked on I/O, forcing the team to over‑provision servers.

CAS collisions : multiple threads frequently contested the same version number, causing retries and wasted CPU cycles.

Architecture Evolution: From “Chaos” to a Structured Pipeline

The refactor introduced Kafka as a durable, partitioned queue. User UIN became the partition key, guaranteeing that all actions of a single user land in the same partition. A Go service now consumes each partition sequentially, turning the previously parallel write pattern into a serialized one at the physical layer.

Code Refactor: Reducing Complexity and Eliminating CAS

In the C++ version, ProcessData tightly coupled business logic with storage I/O, and the time window between Get and Commit allowed other threads to modify data, causing frequent QuickCommit failures.

After migration, the Go implementation follows a clear three‑step flow:

func (a *Achieve) ProcessAchieveData(...) error {
    // 1. Get data (CAS version is almost always latest because of serialization)
    cas, memAchieveInfo, err := a.GetAchieveData(ctx, cmemKey)
    // 2. Compute in memory (same logic as C++ but without lock worries)
    // ...
    // 3. Persist (CAS failure is now rare)
    if changed {
        a.Proxy.Set(ctx, cmemKey, memAchieveInfo, ..., WithSetCas(cas))
    }
    return nil
}

Key improvements:

CAS contention dropped from double‑digit percentages to near zero.

Code size reduced by ~40%.

Handling Large Achievement Payloads

When a user’s achievement data becomes too large, it is split into sub‑keys. The Go version uses native slices and sort.Slice for ordering, replacing the C++ multimap iterator pattern that caused high memory usage and iterator invalidation risks.

sort.Slice(playerDataList, func(i, j int) bool {
    return playerDataList[i].data.LLastModifyTime > playerDataList[j].data.LLastModifyTime
})

Data Merging Strategy

Reports are grouped by player (platform+zone+server+role ID) and sorted by Kafka timestamp. For each type, only the latest entry is kept, turning many potential CAS conflicts into a single merge operation.

type itemWithTS struct {
    item achieve.SAchieveReportItem
    ts   int64 // Kafka message timestamp
}

func (a *AchievesLogic) ReportAchieveData(ctx context.Context, aMsgs []*model.AchieveMsg) {
    playerItems := map[common.SGCAchievePlayer][]*itemWithTS{}
    // 1. Group by player
    for _, aMsg := range aMsgs {
        player, items := parseReportData(aMsg)
        playerItems[player] = append(playerItems[player], items...)
    }
    // 2. Sort each group by timestamp descending
    for _, items := range playerItems {
        sort.Slice(items, func(i, j int) bool { return items[i].ts > items[j].ts })
    }
    // 3. Deduplicate by type, keep latest
    // ...
}

Performance and Cost Results

After the migration:

Server count reduced by 82% (from dozens of 4‑core 8 GB machines to a fraction of that).

CPU core usage dropped by 73%.

Memory consumption fell by 91%.

CAS write error rate eliminated (99.9% reduction).

Average request latency decreased by 40% because retry overhead vanished.

The Go service’s goroutine model dramatically cut context‑switch overhead, and Kafka’s peak‑shaving capability allowed the same business load to be supported with only 26% of the original compute power.

Monitoring and Alerting Improvements

With the new pipeline, false alerts caused by CAS conflicts disappeared, allowing the alert system to focus on genuine storage or business errors. Integration with the Galilean monitoring suite added fine‑grained metrics for Kafka consumer lag, goroutine count, and memory allocation, giving deeper insight into service health.

Conclusion

The refactor demonstrates that replacing a lock‑heavy synchronous architecture with an asynchronous, partitioned queue can turn “architectural space” into “computational time”, yielding massive cost savings, higher stability, and a cleaner codebase for future scaling.

KafkaHigh Concurrencyrefactoring
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.