Big Data 8 min read

Why LinkedIn Is Replacing Kafka with Its Own Next‑Gen Streaming System

LinkedIn, facing planetary‑scale data volumes, found Kafka’s architecture hitting fundamental limits and built Northguard—a decentralized, log‑striped streaming platform with Raft‑based metadata and an Xinfra migration layer—to gradually replace Kafka’s core responsibilities while maintaining compatibility.

LuTiao Programming
LuTiao Programming
LuTiao Programming
Why LinkedIn Is Replacing Kafka with Its Own Next‑Gen Streaming System

Background

Kafka was created at LinkedIn in 2010 to provide a high‑throughput, low‑latency, horizontally scalable log system for roughly 90 million users. By 2026 LinkedIn processes more than 32 trillion records per day, runs over 150 Kafka clusters, and manages on the order of 400 k topics.

Scale‑induced bottlenecks

Metadata centralization

The Kafka controller handles topic creation, partition assignment, leader election, and rebalance coordination. At LinkedIn’s scale the controller becomes a system‑wide amplification point, analogous to a single manager approving 400 k projects and overseeing 150 departments.

Rebalance = stop‑the‑world

Adding a broker, expanding disk capacity, or changing replica policies can trigger a large‑scale rebalance. In high‑throughput, strict‑SLA environments the entire cluster “moves” to accommodate the change, creating an unacceptable operational risk.

Hot‑partition imbalance

Only a few partitions become hotspots while many remain idle, leading to uneven disk, I/O, and network utilization.

Northguard: a new paradigm

LinkedIn chose to replace the Kafka model rather than patch it, creating Northguard.

Log striping

Logs are split into fixed‑size (~1 GB) segments.

This eliminates the “large partition” problem, makes data naturally migratable, enables automatic load balancing, and removes the need for manual hotspot mitigation. The change can be visualized as moving from a single massive hard‑disk to a collection of portable SSD modules.

Decentralized metadata – Raft + sharded state machines

Northguard discards the single‑point controller. Metadata is sharded; each shard is managed by a Raft state machine, turning the control plane itself into a distributed system. The result is the removal of a “brain‑death point”.

Xinfra – progressive migration layer

Xinfra provides a unified Pub/Sub API upward and supports both Kafka and Northguard downstream, allowing business services to switch without code changes. This layer makes migrating a 32‑trillion‑event‑per‑day system feasible.

Kafka vs. Northguard

This is an inevitable divergence at extreme scale, not a technology‑choice battle.

Kafka: built for the majority of companies.

Northguard: built for LinkedIn‑scale workloads.

Analogy: Kafka is a reliable heavy‑duty truck; Northguard is an F1‑grade race car—useful for its purpose, but the race car is unnecessary for ordinary tasks.

Practical considerations

Availability

Northguard is currently an internal LinkedIn system and has not been open‑sourced.

Kafka relevance

Kafka remains the industry standard and continues to be required knowledge.

Migration feasibility

The Xinfra migration layer is a product of a massive engineering organization; replicating the migration at typical companies is considered unlikely.

Conclusion

Even successful architectures have lifecycles. LinkedIn’s adoption of Northguard reflects the natural evolution when a system reaches planetary scale. The three takeaways are:

Architecture must adapt to scale; there is no eternally correct design.

World‑class systems are eventually outpaced by their own creators.

Replacing a flagship technology demonstrates top‑tier engineering capability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed systemsStreamingKafkaData ArchitectureLinkedInNorthguard
LuTiao Programming
Written by

LuTiao Programming

LuTiao Programming is a friendly community offering free programming lessons. We inspire learners to explore new ideas and technologies and quickly acquire job-ready skills.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.