Evolution of LinkedIn’s Backend Architecture: From the Leo Monolith to a Scalable Service‑Oriented Platform
The article chronicles LinkedIn’s journey from a single‑server Leo monolith to a highly distributed, service‑oriented backend architecture, detailing the introduction of member graphs, read‑only replicas, caching layers, Kafka pipelines, Rest.li APIs, super‑blocks, and multi‑data‑center deployments to support billions of daily requests.
Author: Josh Clemm, Senior Engineering Manager at LinkedIn.
Since its founding in 2003, LinkedIn has grown from 2,700 users in the first week to over 350 million worldwide, handling thousands of web requests per second, with mobile traffic exceeding 50 %. All of these requests are served by backend systems that must process millions of queries per second.
Leo
Initially, LinkedIn operated a single application service called “Leo” that contained all web servlets, business logic, and connected to a lightweight LinkedIn database.
Member Graph
To manage connections between members, LinkedIn built a dedicated member‑graph service that performed in‑memory graph traversals and communicated with Leo via Java RPC, enabling independent scaling.
Search functionality later leveraged this graph to feed a Lucene‑based search service.
Multiple Read‑Only Replicas
As traffic grew, Leo was load‑balanced across multiple instances, but the primary member database became a bottleneck. Vertical scaling helped temporarily, but eventually read‑only replica databases synchronized via an early version of Databus were introduced to offload read traffic and ensure consistency.
However, the monolithic Leo continued to crash under load, prompting a shift toward decomposing Leo into many small, stateless services.
"Kill Leo" became an internal mantra.
Service‑Oriented Architecture (SOA)
Engineers began extracting microservices that exposed APIs and business logic, while the presentation layer was also separated (e.g., recruiting product, public info pages). By 2010, over 150 independent services existed; today there are more than 750.
These services could be scaled individually, and early configuration and performance monitoring tools were built.
Cache
To further reduce load, LinkedIn added intermediate caching layers (e.g., Memcache, Couchbase) and used Voldemort for pre‑computation. Over time, many caches were removed, leaving only caches closest to the data store to maintain low latency and horizontal scalability.
Kafka
To handle the growing volume of data, LinkedIn created custom data pipelines for streaming processing, feeding data into Hadoop workflows, aggregating service logs, etc. This led to the development of a distributed publish‑subscribe platform—Kafka—that provides near‑real‑time access to any data source, supporting Hadoop jobs, real‑time analytics, site monitoring, and alerting. Kafka now processes over 5 trillion events per day.
Inversion
At the end of 2011, LinkedIn launched the internal project “Inversion”, pausing all feature development to focus on improving tooling, deployment infrastructure, and developer productivity.
Recent Years: Rest.li
After moving to a service‑oriented architecture, the Java RPC‑based APIs became inconsistent and tightly coupled to the presentation layer. LinkedIn therefore created Rest.li, a data‑model‑centric, stateless RESTful API framework usable by non‑Java clients, decoupling APIs from the UI and solving backward‑compatibility issues.
With dynamic discovery (D2), Rest.li also provides client‑side load balancing, service discovery, and scaling. Today LinkedIn hosts over 975 Rest.li resources and handles more than 1 trillion Rest.li calls per day.
Super Blocks
While SOA decouples domains and enables independent scaling, many LinkedIn applications still need to fetch numerous data types, resulting in hundreds of calls that form a complex “call graph”. To manage this, LinkedIn introduced “Super Blocks”, providing a unified API for a group of backend services, allowing dedicated teams to optimize and control each client’s call graph.
Multiple Data Centers
To avoid single points of failure at both the service and site level, LinkedIn operates three primary data centers and provides PoP services globally.
LinkedIn’s story is far richer, with many critical systems such as the member‑graph service, search, communication platform, and client templates each having their own evolution.
© Content sourced from the web; rights belong to the original author. If any infringement is identified, please inform us for prompt removal.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.