How Netflix Achieved Petabyte-Scale, Sub-Second Log Queries with ClickHouse
Netflix processes over 5 PB of logs daily, handling millions of events per second, and by layering hot and cold storage, using a custom lexer for fingerprinting, native protocol serialization, and sharded tag maps, they reduced query latency from seconds to sub‑second levels with ClickHouse.
Introduction: Netflix’s “terrifying” log scale
At Netflix, “scale is everything.” The platform ingests up to 5 PB of logs per day, averaging 10.6 million events per second (peak 12.5 million), with each log about 5 KB, supporting over 40 k micro‑services for more than 300 million subscribers across 190 countries. Logs are retained from two weeks to two years, and the system must answer 500‑1000 queries per second for troubleshooting and monitoring.
To make this petabyte‑scale log data searchable within seconds, Netflix applied three key optimizations, achieving log search within 20 seconds of generation and as low as 2 seconds latency in some scenarios, far better than a 5‑minute SLA.
1. Log Architecture: Hot‑Cold Tiering
The log pipeline consists of:
Log collection : thousands of micro‑services send logs to lightweight sidecars, which forward them to an ingestion cluster.
Temporary buffering : data is written to Amazon S3 and triggers Amazon Kinesis messages.
Core storage tiering
Hot layer (ClickHouse) : stores recent logs for fast, interactive queries.
Cold layer (Apache Iceberg) : stores historical data cost‑effectively and supports large‑time‑range queries.
Unified query layer : a Query API automatically routes queries to the appropriate namespace, abstracting storage details from engineers.
2. Three Core Optimizations
Optimization 1: Log Fingerprinting with a Lexer
To reduce noise, logs are de‑duplicated by fingerprinting similar entries. Early attempts using machine learning were too resource‑intensive, and regular expressions could not keep up with 10 million events per second. Netflix switched to a compiler‑style lexer generated by JFlex, compiling log patterns into efficient code.
Throughput increased 8–10×.
Average fingerprinting time dropped from 216 µs to 23 µs.
P99 latency was dramatically reduced.
Optimization 2: Native Protocol Serialization
After fingerprinting, logs must be written to ClickHouse at millions of rows per second. Initial JDBC batch inserts were inefficient due to schema negotiation overhead. The team moved to ClickHouse’s RowBinary format via the Java client, manually serializing columns, but performance was still insufficient.
Inspired by a ClickHouse blog showing the native protocol’s superiority over RowBinary, Netflix reverse‑engineered the Go client to build a custom encoder that writes LZ4‑compressed native protocol blocks directly to ClickHouse, achieving lower CPU usage, higher memory efficiency, and comparable or better throughput.
Optimization 3: Sharded Tag Maps for Faster Queries
Custom tags (service name, request ID, etc.) are heavily used in queries, but storing them as Map(String, String) caused linear scans over billions of key‑value pairs. LowCardinality helped with keys but not values.
The solution was to shard the map into 31 smaller maps by hashing the tag key, allowing queries to target the relevant shard directly.
Pure filter queries dropped from ~3 s to 1.3 s.
Filter + projection queries dropped from ~3 s to under 700 ms.
Data scanned reduced by 5–8×.
3. Simplicity Is the Core Principle
The three optimizations share a common theme of simplification:
Fingerprinting: replace runtime computation with compile‑time code.
Serialization: use a lower‑level native protocol to eliminate negotiation overhead.
Querying: shard data to reduce scan scope.
As Daniel Muino puts it, “Success isn’t about clever tricks; it’s about simplifying so the system does the least unnecessary work.” For Netflix, this logging system is a critical fault‑diagnosis lifeline supporting over 300 million users, turning “waiting for queries” into “sub‑second responses.”
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
