Cloud Native 15 min read

Modernizing Tencent Cloud Log Service (CLS): Cloud‑Native Architecture, Challenges, and Benefits

Tencent Cloud Log Service was modernized by migrating over 95 % of its components to a cloud‑native stack of containers, Kubernetes, and declarative APIs, addressing chaotic infrastructure, stateful‑to‑stateless conversion, configuration drift, upgrade risk, elastic scaling, traffic protection and observability, which cut costs by more than 20 million CNY, reduced scaling latency by 90 %, and achieved over 99.99 % availability with petabyte‑scale burst handling.

Tencent Cloud Developer

May 8, 2023

Modernizing Tencent Cloud Log Service (CLS): Cloud‑Native Architecture, Challenges, and Benefits

The digital transformation of an enterprise is essentially a process of breaking internal barriers, which usually involves both technical and organizational reconstruction. This article focuses on the technical side, describing how to achieve application modernization and cloud‑native migration to create higher business value.

Business background and challenges of Tencent Cloud Log Service (CLS)

CLS is a one‑stop, high‑reliability, high‑performance log solution that supports petabyte‑scale data ingestion, collection, storage, retrieval, analysis, processing, and subscription. Rapid growth in log volume (from tens of millions to tens of trillions of records per day) caused performance bottlenecks, unstable architecture, and frequent firefighting, which impacted customer satisfaction and revenue.

Three "representatives" of cloud‑native technology

Cloud‑native technologies (containers, Kubernetes, Serverless, etc.) represent the most advanced production capabilities.

Adopting cloud‑native is essential for product competitiveness and rapid iteration.

Cloud‑native provides cost reduction, efficiency improvement, and resource elasticity.

Challenge 1: Chaotic infrastructure

Legacy physical machines and VMs lead to inconsistent environments, long provisioning cycles, and resource waste. The evolution path from physical servers → virtual machines → containers is described, including rich‑container and sidecar patterns.

Challenge 2: Converting stateful applications to stateless

Stateless services can scale horizontally and survive failures without impact.

Stateful services require complex data synchronization and are harder to scale.

Two common approaches are presented: (1) synchronize state among multiple instances, and (2) externalize state to a centralized storage system.

Challenge 3: Configuration management

Modern cloud‑native applications have scattered configuration across networking, databases, middleware, etc. A unified configuration center, version control, and CI/CD pipelines are required to avoid configuration drift and ensure consistent deployments.

Challenge 4: Smooth architecture upgrade

A seamless upgrade strategy includes canary releases, maintaining old services for a rollback window, and minimizing risk.

Challenge 5: Elastic scaling

Handle traffic spikes with automatic horizontal pod autoscaling (HPA).

Reduce cost by scaling down when load subsides.

Maintain stability by coordinating upstream/downstream scaling and custom metrics.

Challenge 6: Traffic protection and fault tolerance

CLS implements end‑to‑end observability, DNS‑based isolation, rate limiting, and rapid elastic scaling (up to ten thousand cores within minutes) to protect against attacks and failures.

Challenge 7: Observability and development efficiency

Build multi‑layer observability (user, application, middleware, infrastructure).

Automate issue detection, reduce mean time to resolution, and avoid frequent firefighting.

Development efficiency improvements

CI pipeline with >1000 automated test cases ensures compatibility and stability.

Automated release orchestration across dozens of regions reduces manual effort and error rates.

Results of the cloud‑native transformation

The CLS architecture now fully embraces cloud‑native components (containers, Kubernetes, declarative APIs, elastic scaling). After nearly a year of migration, >95% of services are containerized, operational costs are reduced by over 20 million CNY per year, resource usage is cut by more than 10 万 cores, scaling latency is reduced by 90 %, and utilization improves by >40 %. Service availability exceeds 99.99 % with PB‑level burst handling capability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture cloud-native observability Configuration Management elastic scaling Log Service

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.