Operations 15 min read

How Tencent Saved 8 Million QQ Users by Migrating Legacy Services

This article recounts how Tencent's operations team tackled the urgent migration of aging data‑center infrastructure to preserve service for 8 million legacy QQ users, detailing the challenges, strategic choices, IP‑level network relocation, and the DevOps practices that ensured a successful cut‑over.

Efficient Ops
Efficient Ops
Efficient Ops
How Tencent Saved 8 Million QQ Users by Migrating Legacy Services

Overview

After reading the DevOps‑focused book The Phoenix Project , the author reflects on the principle that IT value lies in enhancing business and user value, and introduces a rescue project for 8 million QQ users.

The IT value chain can be seen as Company → Product → User. Tencent's social network division runs over 100,000 servers distributed across many IDC sites nationwide.

Many IDC facilities are aging and can no longer support current business demands, leading to problems such as insufficient rack space, limited outbound bandwidth, outdated planning, and hardware wear.

Service migration is driven by two factors: proactive optimization of service quality and passive migration caused by IDC upgrades or decommissioning.

In practice, long‑tail service migrations often incur high costs and difficulty.

The article is organized into three parts: the impending shutdown of 8 million users, the challenges and choices faced, and the "big move" to preserve service.

800W Users Facing Service Termination

Last year the team began a passive migration because IDC hardware was aging and needed to be decommissioned. The migration involved over 2,000 machines and more than 150 business modules; the QQ mobile operations team handled the largest share. The IDC will eventually lose power and network connectivity.

Large‑scale migrations are time‑consuming, costly, and introduce risk to service quality.

The shift from the PC‑Internet era to the mobile‑Internet era brought new challenges: diverse early‑APP versions, complex network environments, and higher requirements for fault tolerance, disaster recovery, service quality, and user scheduling.

Non‑smartphone QQ versions (MTK, Symbian, Kjava) run on dozens of device models. These versions hard‑code VIP addresses and cannot dynamically update them, nor can the scheduling system intervene.

Without a scheduling solution, the 8 million users would lose service when the IDC is powered down.

Challenges & Choices

The QQ user service flow is: client obtains a VIP, contacts the backend, and receives a response.

Key challenges

Hard‑coded VIP : VIPs are embedded in old client versions and tightly coupled with the retiring IDC network.

Client version stagnation : The three client families have not been updated for over seven years, and the original developers are no longer available.

Version coverage : Users must manually upgrade, which is a slow process for the entire 8 million‑user base.

Imminent deadline : The IDC will be powered off on a fixed date, leaving no room for delay.

Data: QQ’s DAU is 830 million; the 8 million legacy users represent about 1 % of DAU and less than 1 % of peak concurrent users.

Our choice

Abandoning the 8 million users was considered because it would have been the lowest‑cost option, but the team rejected it as it conflicted with their core value of delivering user‑centric service.

Note: Non‑smartphone versions only provide basic messaging due to hardware constraints and 2G network limits.

The Big Move (IP Migration)

The core problem is that VIPs are strongly coupled with the retiring IDC’s network environment.

Solution: Perform a 1:1 IP/network migration, moving the entire VIP‑related subnet from the old IDC to a new IDC so that VIP services remain uninterrupted.

Key difficulties

Significant architectural differences between the old and new IDC require moving two network segments.

Coordinating with three telecom operators to obtain testing and migration support.

Transferring network security and load‑balancing policies to the new IDC.

Extensive cross‑department collaboration was required.

After securing operator support, the team deployed the full set of VIP backend services in the new IDC, defined a cut‑over plan, prepared emergency procedures, and notified users.

On a late‑night cut‑over, the IP migration succeeded; VIPs were moved, users automatically re‑logged, and service remained stable, saving the 8 million users from service interruption.

The successful execution demonstrated the importance of standardized operation objects, non‑functional specifications, and automation in DevOps.

Effective asset management: define and record operational objects in a configuration system for traceability.

Non‑functional standards: avoid hard‑coded IPs and enforce operational policies from the start.

Standardized procedures: automate repetitive, low‑value tasks and provide consistent change‑management workflows.

operationsDevOpsTencentdata centerlegacy-migrationnetwork-ip
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.