Cloud Computing 8 min read

How Tencent’s Elastic Platform Powers Billions of Daily Image Compressions with 6K Containers

Tencent’s elastic computing platform replaces 24,000 physical servers with just 6,000 containers, delivering sustainable compute for billions of daily image compressions while also supporting video transcoding, Spark jobs, and AI workloads through dynamic resource isolation, named services, and intelligent scheduling.

21CTO

Aug 19, 2017

How Tencent’s Elastic Platform Powers Billions of Daily Image Compressions with 6K Containers

Background

QQ Album, WeChat image sharing and Moments generate nearly a hundred billion images daily, creating a massive compression workload. Each compression task runs on Tencent’s TCS elastic computing platform, which the article examines for handling such scale.

Feasibility Issues with the Legacy Approach

The previous mixed deployment of compression programs on storage machines saved hardware costs but caused three major problems:

Low resource utilization – peak‑time provisioning required tens of thousands of devices, leaving most CPU idle during off‑peak periods.

Increased operations cost – rapid business growth forced frequent hardware provisioning and manual intervention.

Business interference – co‑located compression and storage services competed for CPU and memory, leading to performance degradation during spikes.

Platform Advantages

The TCS elastic platform provides physical resource isolation, automatic scaling, and dynamic weight adjustment, directly addressing the above pain points.

Resource Isolation

When compression and storage programs share hardware, CPU time‑slice contention spikes during peak loads, especially for programs bound to specific cores. The platform builds on Docker isolation and cgroup settings (quota, share, period) and adds a dynamic CPU‑binding strategy that monitors core load and schedules containers onto less‑loaded CPUs, improving overall performance.

Named Service

Instead of a per‑region master managing each compression pool, the platform offers a name‑based service. Developers obtain a logical name, and the platform attaches compute resources to that name, automatically handling load balancing, fault removal, and scaling, thus reducing operational overhead and speeding up onboarding.

Automatic Scheduling

The platform implements three scheduling mechanisms based on real‑time metrics:

Dynamic Scheduling : Monitors container CPU, memory, disk I/O, and network usage; when load exceeds a high threshold, resources are expanded within seconds; when below a low threshold, resources are reduced within minutes.

Exception Scheduling : Uses CPI (Cycles Per Instruction) to model normal workload behavior; significant CPI variance triggers container eviction or replacement.

Perception Scheduling : When compression latency or failure rate rises despite stable CPU/CPI, the business side reports the issue; the platform then demotes or replaces the affected containers.

Performance Highlights

Two charts show container count versus CPU load, illustrating a nearly flat CPU utilization line throughout the day. Additional graphs demonstrate balanced load distribution after applying CPU performance coefficients for heterogeneous hardware.

Summary and Outlook

The original image‑compression service relied on 24,000 physical servers; the elastic platform now delivers the same workload with only 6,000 containers, achieving an average CPU utilization of 56% and supporting additional services such as video transcoding, Spark jobs, and AI for Go and Honor of Kings. By year‑end the platform aims to schedule up to 1,000,000 CPU cores, further lowering cost and enabling continuous, low‑cost compute for AI initiatives and broader business goals.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

image compression Resource Scheduling cloud infrastructure Container Orchestration elastic computing

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.