Backend Development 15 min read

Designing a High‑Performance Log Collection System with UDP, Compression, and ClickHouse

The article analyzes the high cost and scalability challenges of traditional log collection pipelines and proposes a streamlined architecture that uses in‑memory buffering, UDP transport, aggressive compression, and ClickHouse storage to achieve massive throughput while drastically reducing hardware and operational expenses.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Designing a High‑Performance Log Collection System with UDP, Compression, and ClickHouse

In everyday development, logs (request parameters, info, error messages) are essential for troubleshooting, but conventional storage solutions—local files, ELK stack, or Kafka pipelines—become prohibitively expensive when traffic scales to tens of thousands of requests per second, generating terabytes of data per hour.

Cost analysis of a typical JD.com module shows that a 60 KB average log per request can produce 1.8 GB/s at 30 k requests, requiring over 2 TB of storage per hour and massive disk I/O on both the log producers and the message queue, leading to the need for thousands of servers.

Applying Ockham's razor, the author removes unnecessary steps: instead of writing logs to disk and forwarding them through Kafka, logs are kept in memory, compressed with Snappy/ZSTD, and sent directly to workers via UDP (or HTTP if the packet exceeds 64 KB).

The proposed pipeline consists of a configuration center, a lightweight client SDK that captures request parameters and key log entries, compresses them, serializes with Protobuf, and pushes them over UDP; workers receive the data, parse it, and batch‑write into ClickHouse, a column‑oriented OLAP database with excellent compression and write performance.

Client implementation is simple: custom filters/appender for Log4j/Logback capture logs, use TransmittableThreadLocal to preserve trace IDs across thread pools, and perform a single Protobuf serialization before UDP transmission.

Worker nodes employ large‑memory containers, double‑buffered queues, and multi‑threaded consumers to handle up to 20 k QPS per instance, achieving 160‑200 MB/s stable write throughput to ClickHouse, which translates to over 1 GB of raw data per second.

ClickHouse is deployed with a high‑availability proxy layer (CHProxy) and local‑table writes to avoid the overhead of distributed tables, delivering 2‑3× higher ingestion rates and supporting massive parallel queries through proper sharding, pre‑where filters, and index design.

A simple query console built on top of ClickHouse enables multi‑dimensional, time‑range queries with fast response times, even on billions of rows.

Comparative results show that the new architecture reduces disk usage by more than 75 %, cuts CPU server count by over 70 %, and improves worker consumption speed by an order of magnitude compared to traditional MQ‑based pipelines, delivering a cost‑effective, scalable solution for high‑volume log collection.

backendMonitoringClickHouselog collectioncompressionhigh throughputUDP
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.