How eBay Scales to Billions: 7 Proven Practices for Massive Web Systems
This article outlines eBay's seven key scalability best practices—including functional partitioning, horizontal sharding, avoiding distributed transactions, asynchronous decoupling, stream processing, virtualization, and strategic caching—to illustrate how large‑scale web platforms can achieve linear resource growth and high availability.
eBay faces extreme scalability pressure, serving billions of page views daily and handling petabytes of data. In such an environment, resource consumption must grow linearly (or better) with load, and scalability becomes a matter of life and death.
Best Practice #1: Partition by Function
Group related functionality together and separate unrelated parts, whether called SOA, functional decomposition, or engineering secret. This reduces coupling and enables independent scaling of each function.
At the code level, JAR files, packages, bundles, etc., isolate features. At the application level, eBay runs sales, bidding, and search in separate server pools—about 16,000 servers divided into 220 pools—allowing each pool to be scaled independently. The database layer follows the same principle, with logical databases for users, items, purchases, etc., spread across 400 physical hosts, enabling targeted scaling of specific data domains.
Best Practice #2: Horizontal Sharding
Beyond functional partitioning, workloads must be broken into many small, manageable units. Stateless application servers can be load‑balanced easily; adding capacity is as simple as provisioning more servers.
Stateful data requires sharding. eBay shards user data across 20 hosts, each holding 1/20 of users, and adds hosts as the user base grows. Similar strategies apply to item, purchase, and account data, using techniques such as modulo hashing, range partitioning, lookup tables, or hybrid schemes.
Best Practice #3: Avoid Distributed Transactions
Cross‑resource two‑phase commits are costly and harm scalability, latency, and availability. Instead, eBay relaxes ACID guarantees, favoring eventual consistency where appropriate, and uses techniques like ordered database operations, asynchronous recovery, and reconciliation batches.
For most operations, single‑statement auto‑commit is used; only carefully defined cases bundle statements into a single transaction. Consistency is treated as a tunable property rather than a binary choice.
Best Practice #4: Decouple with Asynchronous Strategies
Asynchronous communication (queues, multicast, batch jobs) removes tight coupling between components, allowing each to scale and remain available independently. Even within a component, staged event‑driven architectures (SEDA) provide asynchronous processing while keeping the programming model understandable.
Best Practice #5: Turn Processes into Asynchronous Flows
By moving non‑critical work to background pipelines, response latency for users is reduced. Activities like activity tracking, invoicing, settlement, and reporting are handled asynchronously, smoothing load spikes and lowering infrastructure costs.
Best Practice #6: Virtualize Every Layer
Virtualization abstracts hardware and software layers—OS, JVM, ORM, load balancers, virtual IPs—enabling flexible reallocation of resources without code changes. eBay virtualizes databases and routing logic, allowing logical hosts to be moved across physical machines seamlessly, and virtualizes search by aggregating results from multiple partitions into a single logical index.
Best Practice #7: Use Caching Judiciously
Caching is most effective for read‑heavy, rarely changing data such as metadata and static content. eBay employs both push and pull cache update strategies, but deliberately avoids caching mutable session data or shared business objects to preserve correctness and availability.
Over‑caching can create new bottlenecks; excessive memory allocated to cache reduces memory available for request processing, and reliance on cache health can jeopardize overall system availability.
Conclusion
Scalability is not a non‑functional afterthought; it is a prerequisite functional requirement with the highest priority. The seven practices described provide a comprehensive roadmap for building systems that can grow linearly with demand while maintaining high availability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
