How to Slash Cloud Costs: Real Lessons from Tencent’s Video Platforms
This article examines Tencent's cost‑optimization journey for its short‑video and streaming services, breaking down human, server, and bandwidth expenses, explaining precise accounting methods, negotiation tactics, usage‑vs‑efficiency strategies, and global resource‑scheduling techniques to achieve sustainable cost reduction.
Overview
From Q3 2021 to early 2023, cost optimization efforts for Weishi and Tencent Video built on earlier internal sharing.
Rapid Traffic Growth
Mobile internet benefited from population growth, creating large social traffic pools. Applications such as App Store aggregated traffic from WeChat, QQ, and games, generating profit via ads, games, and other commercial models. However, Weishi's content ecosystem lacked stickiness, limiting profit despite high DAU and VV.
Where Costs Come From
Human resources are the biggest cost driver; more staff leads to higher salaries, office costs, and more code, which in turn requires more servers and bandwidth.
Server costs include CPU, memory, disk, electricity, and data‑center maintenance, often billed per CPU core (e.g., 20 CNY per core per month), encompassing all associated expenses.
Bandwidth costs—streaming, static, live, P2P, PCDN—also form a major expense for content‑heavy services.
Accounting the Costs
Each resource cost follows the formula: usage × unit price. Understanding usage measurement rules (e.g., CDN billed on peak bandwidth) is essential for targeted optimization.
Teams must allocate costs to their own resource consumption, making them accountable for the expenses they generate.
Negotiating Prices and Reducing Usage
While consumers lack direct pricing power, they can negotiate, especially for bandwidth. Reducing usage is a technical challenge: identify actual CPU/memory consumption of legacy code and cut waste.
Usage vs. Efficiency
Usage reduction means lowering the amount of resources used (e.g., halving CPU cores). Efficiency improvement means increasing the work done per unit of resource (e.g., raising CPU utilization from 25% to 45%).
Examples
CPU: Adjust microservice replica counts to raise peak utilization.
Redis: Reduce allocated memory from 100 GB to the actual needed 20 GB, improving memory hit rate and cutting costs.
Hit rate: proportion of requests served from memory versus persistent storage.
Storage density: access frequency per GB, indicating whether data should reside in memory or cheaper storage.
Global Efficiency Optimization
Shift offline workloads (e.g., video transcoding) to off‑peak periods, use Kubernetes HPA for mixed online/offline scheduling, and employ algorithmic models (linear regression, LightGBM) to predict resource needs and achieve global optimal resource allocation.
Everyone Is a Small CEO
Cost‑reduction and efficiency‑increase are now measured by ROC (return on capital) rather than ROI, emphasizing overall capital efficiency.
Programmers must balance performance improvements with physical hardware costs, avoiding wasteful over‑provisioning and treating resource usage as a core business metric.
Tech Architecture Stories
Internet tech practitioner sharing insights on business architecture, technology, and a lifelong love of tech.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
