Big Data 11 min read

How LinkedIn Scales Kafka to Billions of Messages Every Day

This article explains how LinkedIn uses Apache Kafka as a high‑throughput, fault‑tolerant messaging backbone, detailing its architecture, message categories, layered replication, audit mechanisms, and the engineering practices that keep billions of daily messages reliable and fast.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How LinkedIn Scales Kafka to Billions of Messages Every Day

If data is the lifeline of high‑tech, Apache Kafka is the circulatory system that powers LinkedIn’s massive data movement across dozens of systems, handling billions of messages each day.

What is Kafka?

Apache Kafka is an evolved publish/subscribe messaging system that combines queue and log semantics. Messages are organized into topics and partitions, supporting multiple producers and consumers. Each Kafka cluster retains messages reliably and efficiently.

Time‑based retention (e.g., LinkedIn measures over days)

Size‑based segment retention

Key‑based retention keeping only the latest value

Kafka offers reliability, flexibility, and high throughput.

How big is “big”?

LinkedIn runs over 60 clusters with more than 1,100 Kafka brokers, processing over 8 trillion messages daily (≈175 PB inbound, 650 PB outbound). At peak, it ingests >13 million messages per second (≈2.75 GB/s).

Message Types

Queues : Standard messages for coordination and state, used for email, data distribution, and backend integration.

Metrics : System and hardware statistics that drive internal monitoring and alerting.

Logs : Application, system, and access logs that were originally co‑located with metrics but are now separated due to volume.

Traces : Detailed records of infrastructure actions, feeding stream processors like Apache Samza and batch jobs in Hadoop for indexing, usage tracking, and real‑time analytics.

Layered Architecture and Aggregation

LinkedIn’s data centers host local message containers that aggregate into a global cluster via Kafka MirrorMaker, reducing cross‑data‑center traffic and latency. The hierarchy consists of producers (first layer), local clusters (second layer), and aggregate clusters (additional layers), with consumers at the top.

This layering improves bandwidth usage but adds monitoring complexity; each layer must be audited to ensure no message loss.

Audit Integrity

LinkedIn’s internal Kafka Audit tool adds a header with timestamp, producer service, and host to every message. Producers periodically publish counts to audit topics, allowing comparison with consumer metrics to detect missing or duplicated messages.

Kafka Console Auditor consumes all topics, aggregates counts, and verifies that each layer sees the same volume, guaranteeing end‑to‑end delivery.

Putting It All Together

LinkedIn’s dedicated Kafka engineering team, comprising core open‑source contributors, provides internal libraries (e.g., Tracker Producer) that enrich messages with headers and audit data, and maintains custom tools like the console auditor.

Outlook

LinkedIn continues to push Kafka’s limits, targeting daily volumes of 10 trillion messages, enhancing security, quota controls, and integrating with the Samza stream‑processing framework. The SRE team automates operational tasks such as partition rebalancing, while the company remains active in the Apache Kafka community through meetups and open‑source contributions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsBig Datamessage queuesKafkaLinkedIn
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.