How to Build a Cost‑Effective High‑Throughput Log Collection System with ClickHouse and UDP
This article analyzes the challenges of massive log storage and retrieval, calculates the bandwidth and hardware costs of traditional pipelines, and presents a streamlined architecture that uses in‑memory buffering, UDP transport, compression, and ClickHouse to achieve petabyte‑scale throughput while cutting storage costs by over 75%.
Background
In everyday services we need to store logs such as request parameters, info and error messages to aid troubleshooting. Traditional approaches start with local files (log4j) and evolve to ELK stacks, then to message queues like Kafka, and finally to filebeat‑based ingestion. These solutions work for modest traffic but become costly at scale.
Cost Explosion at Large Scale
Using a JD.com App module as an example, a single request generates 40 KB–2 MB of log data (median ~60 KB). With 30 k requests per second the raw log volume reaches 1.8 GB/s, and peak traffic can demand >15 GB/s. Storing the raw files, writing to Kafka (which also persists to disk), and replicating data would require thousands of servers, making the solution financially untenable.
Shortening the Pipeline
Applying Occam’s razor—"remove unnecessary entities"—the design discards local disk writes and Kafka. Logs are kept in memory, compressed with Snappy or ZSTD, and sent directly via UDP (or HTTP if the packet exceeds 64 KB) to a worker cluster.
Robust Log Collection System
The new architecture consists of four components:
Configuration Center : Stores worker IPs for clients to discover.
Client : Pulls worker addresses, compresses logs, and streams them over UDP.
Worker : Receives UDP packets, parses them, and batches inserts into ClickHouse.
ClickHouse : A column‑oriented OLAP database with high compression and write performance, partitioned by day for fast queries.
Workers are the performance bottleneck; they use large‑memory containers (8 CPU / 32 GB) and a double‑buffer queue to absorb bursts before writing to ClickHouse.
Client‑Side Log Aggregation
The client SDK provides filters for HTTP and RPC frameworks to capture request/response payloads, and custom appenders for Log4j/Logback/Log4j2 that buffer logs in memory and forward them via UDP. Large messages that still exceed UDP limits are sent over HTTP. Thread‑local storage (TransmittableThreadLocal) preserves trace IDs across thread pools.
Worker‑Side Consumption and Ingestion
Workers can process 10‑50 million raw log rows per second, translating to ~2 × 10⁴ client QPS. ClickHouse ingestion stabilises at 160‑200 MB/s per worker, meaning a few hundred workers can handle hundreds of gigabytes of raw logs per second. All data remains compressed until a user query triggers decompression.
ClickHouse Advantages
ClickHouse’s vectorised execution, SIMD optimisations, and columnar storage deliver 2‑3× higher write throughput when using local tables instead of distributed tables. The cluster employs a three‑layer architecture (Domain → CHProxy → CH nodes) with automatic fail‑over, ensuring high availability.
Multi‑Condition Query Console
The UI provides simple SQL‑based queries, leveraging ClickHouse features such as PREWHERE and proper sharding to achieve sub‑second response times on billions of rows. Indexes on time and user identifiers further accelerate look‑ups.
Summary & Comparison
Compared with the traditional pipeline (disk + Kafka + DB), the new design reduces disk usage to ~0.8 × ClickHouse’s footprint (after compression) and cuts overall storage cost by >75 %. CPU consumption also drops because the client only performs a single protobuf serialization, and workers avoid double‑disk writes. The result is a scalable, low‑cost log collection system capable of handling petabyte‑scale daily traffic.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
