Operations 15 min read

How Volcano Engine’s TLS Transforms Log Management for Kubernetes at Scale

This article explains the challenges of traditional open‑source log collection in cloud‑native environments, describes Volcano Engine’s unified TLS architecture, its centralized configuration, CRD‑based deployment, and showcases real‑world case studies that demonstrate improved availability, efficiency, and scalability.

Volcano Engine Developer Services

Apr 26, 2022

How Volcano Engine’s TLS Transforms Log Management for Kubernetes at Scale

Logs are ubiquitous in IT systems and serve as a key source of big data; a unified log platform is needed to collect, process, store, query, analyze, visualize, alert, and deliver logs throughout their lifecycle.

Kubernetes Log Collection Self‑Built Solutions

Early on, teams built independent log systems using the typical open‑source stack Filebeat+Logstash+ES+Kibana , but encountered several problems:

Duplicate construction across business modules.

High resource consumption and cost of ES‑centric architecture, with tight Kibana coupling.

Complex YAML configuration for many nodes.

Difficult management of collection configurations and limited data sources.

Kubernetes Log Collection Options

DaemonSet – deploy an agent on each host to collect container stdout files.

Streaming Sidecar – convert file output to stdout via a sidecar container.

Sidecar Logging Agent – run a dedicated agent sidecar inside the pod.

API/SDK – use APIs or SDKs inside containers to push logs.

The first three only support stdout, while the fourth requires code changes, making them unsuitable for file‑based log classification needs.

Challenges of Self‑Built Log Collection in Cloud‑Native Scenarios

Collection difficulty : Complex configuration, hard to meet multi‑line, regex, filtering, time parsing, and file collection requirements.

Productization : Low availability, high resource cost, and limited functionality such as weak Kafka delivery and visualization.

Volcano Engine Unified Log Platform (TLS)

The TLS architecture consists of a high‑speed buffer cluster, a storage cluster, a processing cluster, and an index cluster, supporting multiple protocols (proprietary, OpenTelemetry, Kafka) via LogCollector/SDK/API.

Data flow: logs are first stored in a high‑speed buffer (peak‑shaving), then streamed to the storage cluster, processed or indexed, and finally made available for real‑time query and analysis. TLS offers Lucene query syntax, SQL‑92 analysis, visual dashboards, and rich alerting.

System Optimizations

Centralized, white‑screen configuration management enables dynamic configuration distribution, agent status monitoring, and automatic upgrades.

Clients send heartbeats with version info.

Server checks version and decides whether to push new config.

Clients receive and hot‑load the configuration.

CRD‑based cloud‑native configuration allows users to declare log collection rules in YAML without writing code, improving efficiency.

The LogCollector client isolates different inputs into separate pipelines, supports multiple outputs with tenant authentication, and implements adaptive back‑pressure to avoid overloading the server.

Product Optimizations

Availability is enhanced with multi‑level global flow control at the buffer, storage, and index stages, preventing cascade failures.

Shard‑based flow control for the buffer.

Cluster‑level flow control for storage.

Index‑level flow control for write and query concurrency.

Efficiency is improved by separating index data from raw data, allowing independent scaling, lower storage cost, and higher availability.

Data can be consumed directly from the storage cluster, bypassing the index.

Separate storage tiers reduce cost.

Asynchronous index creation avoids blocking writes.

Index management and scheduling across multiple index clusters enable elastic scaling and load balancing.

Feature Enhancements

Rich consumption delivery: consumer groups, Kafka protocol, S3 protocol.

Query analysis with SQL‑92 and visual dashboards.

Log alerting via SMS, email, Feishu, etc.

Visualization dashboards linked with alerts.

Practical Cases

Internal business and operations logs : TLS now handles logs from multiple regions, achieving 80% resource utilization, high availability through multi‑level flow control, easy operation with few staff, and rapid onboarding within one hour.

Education industry customer : Unified collection of file, app, Kubernetes, and user‑behavior logs; integration with the customer’s big‑data platform; archival to object storage for long‑term retention; reduced construction and operation costs.

Future Outlook

One‑click log collection for cloud products.

Deep optimization of the search engine.

Functional interfaces for data cleaning and processing.

Deeper integration with third‑party platforms and Volcano Engine cloud services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems monitoring Cloud Native Observability Kubernetes tls log management

Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.