Tencent’s Elastic Compute: Efficient Idle Resource Use Without Service Disruption
This article describes Tencent’s elastic computing platform built to harness idle on‑premise resources for massive image, video, AI, and log processing workloads, detailing the architectural layers, strategies for protecting online service capacity, latency, scheduling and fault rates, and the practical lessons learned from its deployment.
Project Background
WeChat, QQ, and other services generate massive daily uploads of images and videos that require compression and transcoding. AI workloads such as Go and game simulations also demand extensive compute. Existing servers host online services with clear peak‑off‑peak patterns, leaving many resources idle. To meet growing compute needs, the virtualization team built an elastic compute platform that reuses idle on‑premise resources.
Key Challenges
Two main challenges arise when sharing space resources:
Impact on Online Service Quality
Compute workloads share CPU execution units, caches, memory, disk, and network with online services. Any uncontrolled contention—e.g., L3 cache conflicts—can degrade compute performance by over 60% and affect online service quality.
Utilizing Elastic Resources Effectively
Elastic resources are diverse, with varying specifications, ports, and dynamic quotas that can change with online load, making them hard to schedule and use consistently.
Technical Architecture
The platform consists of three layers:
Access Layer : Provides service‑oriented APIs for service access, configuration, and image management.
Scheduling Layer : Uses a name service to abstract diverse resources, enabling load balancing, auto‑scaling, fault handling, staggered scheduling, and gray‑release capabilities.
Node Layer : Implements resource isolation, conflict detection, and container monitoring for upper‑layer consumption.
Avoiding Impact on Online Services
Preserving Capacity
We mix workloads with complementary resource profiles (CPU‑heavy vs. network‑heavy) using a simple performance model. For example, a 10 Gbps server with a 73 MB/s per‑core bandwidth can pair a CPU‑intensive task (100 MB/s) with a network‑intensive task (40 MB/s) in a 1:1 mix. Container specifications (e.g., C4‑8‑100) define CPU, memory, and disk allocations, and we consider hyper‑threading and cache sharing to avoid conflicts.
Maintaining Compute Latency
We monitor CPI (cycles‑per‑instruction) to gauge instruction execution latency. By establishing baseline CPI values per workload and CPU type, we detect abnormal latency. To reduce false alarms we combine CPI with cache‑miss counts and CPU utilization, apply noise filtering (e.g., three consecutive spikes), and adjust CPU shares locally before migrating containers.
Protecting Scheduling Latency
Since Linux is non‑preemptive, online services can suffer scheduling delays when sharing CPUs with compute tasks. We assign Docker cpu.share values (e.g., 4096 for high priority, 1024 for normal, 3 for low) to propagate priority to the kernel. This reduces online service latency spikes by 32.6% in tests.
Preventing Fault Rate Increases
Compressible resources (CPU, network, disk) can be shared safely, but memory is non‑compressible; contention can cause OOM events. The platform implements OOM priority scheduling: the platform pre‑emptively evicts compute containers when memory pressure rises, while the kernel kills low‑priority containers during sudden spikes, sending alerts for cleanup.
Making the Most of Elastic Resources
We offer scenario‑based services for image compression, video transcoding, AI inference, and log processing—each with low per‑core traffic (<10 M/s) and tolerance for node churn. Stateless services use service‑oriented APIs for auto‑scaling; stateful services expose APIs for manual scaling.
To hide resource diversity, we use a CL5 name service that abstracts resource specs and dynamically adjusts weights based on core count, performance benchmarks, and quota availability.
We also provide higher‑level interfaces such as cloud functions, allowing users to submit code without worrying about underlying resource scheduling, similar to S3 storage.
Practical Lessons Learned
Key takeaways include:
Prefer simple, widely‑adopted underlying technologies (e.g., XFS + Overlay) over complex, less‑tested solutions.
Equip large‑scale deployments with hot‑patch capabilities to address low‑level failures quickly.
Effective load balancing before scaling is crucial; uneven workloads or hardware performance variance can cause over‑ or under‑provisioning.
Collaboration between platform and business teams is essential for consistent weight calculations and request normalization.
Overall, the elastic compute platform demonstrates how to reclaim idle compute capacity while safeguarding online service performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
