Operations 14 min read

How eBay Scales to Billions: 7 Proven Practices for Massive Web Systems

This article outlines eBay's seven key scalability best practices—including functional partitioning, horizontal sharding, avoiding distributed transactions, asynchronous decoupling, stream processing, virtualization, and strategic caching—to illustrate how large‑scale web platforms can achieve linear resource growth and high availability.

Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
How eBay Scales to Billions: 7 Proven Practices for Massive Web Systems

eBay faces extreme scalability pressure, serving billions of page views daily and handling petabytes of data. In such an environment, resource consumption must grow linearly (or better) with load, and scalability becomes a matter of life and death.

Best Practice #1: Partition by Function

Group related functionality together and separate unrelated parts, whether called SOA, functional decomposition, or engineering secret. This reduces coupling and enables independent scaling of each function.

At the code level, JAR files, packages, bundles, etc., isolate features. At the application level, eBay runs sales, bidding, and search in separate server pools—about 16,000 servers divided into 220 pools—allowing each pool to be scaled independently. The database layer follows the same principle, with logical databases for users, items, purchases, etc., spread across 400 physical hosts, enabling targeted scaling of specific data domains.

Best Practice #2: Horizontal Sharding

Beyond functional partitioning, workloads must be broken into many small, manageable units. Stateless application servers can be load‑balanced easily; adding capacity is as simple as provisioning more servers.

Stateful data requires sharding. eBay shards user data across 20 hosts, each holding 1/20 of users, and adds hosts as the user base grows. Similar strategies apply to item, purchase, and account data, using techniques such as modulo hashing, range partitioning, lookup tables, or hybrid schemes.

Best Practice #3: Avoid Distributed Transactions

Cross‑resource two‑phase commits are costly and harm scalability, latency, and availability. Instead, eBay relaxes ACID guarantees, favoring eventual consistency where appropriate, and uses techniques like ordered database operations, asynchronous recovery, and reconciliation batches.

For most operations, single‑statement auto‑commit is used; only carefully defined cases bundle statements into a single transaction. Consistency is treated as a tunable property rather than a binary choice.

Best Practice #4: Decouple with Asynchronous Strategies

Asynchronous communication (queues, multicast, batch jobs) removes tight coupling between components, allowing each to scale and remain available independently. Even within a component, staged event‑driven architectures (SEDA) provide asynchronous processing while keeping the programming model understandable.

Best Practice #5: Turn Processes into Asynchronous Flows

By moving non‑critical work to background pipelines, response latency for users is reduced. Activities like activity tracking, invoicing, settlement, and reporting are handled asynchronously, smoothing load spikes and lowering infrastructure costs.

Best Practice #6: Virtualize Every Layer

Virtualization abstracts hardware and software layers—OS, JVM, ORM, load balancers, virtual IPs—enabling flexible reallocation of resources without code changes. eBay virtualizes databases and routing logic, allowing logical hosts to be moved across physical machines seamlessly, and virtualizes search by aggregating results from multiple partitions into a single logical index.

Best Practice #7: Use Caching Judiciously

Caching is most effective for read‑heavy, rarely changing data such as metadata and static content. eBay employs both push and pull cache update strategies, but deliberately avoids caching mutable session data or shared business objects to preserve correctness and availability.

Over‑caching can create new bottlenecks; excessive memory allocated to cache reduces memory available for request processing, and reliance on cache health can jeopardize overall system availability.

Conclusion

Scalability is not a non‑functional afterthought; it is a prerequisite functional requirement with the highest priority. The seven practices described provide a comprehensive roadmap for building systems that can grow linearly with demand while maintaining high availability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed systemsarchitectureshardingAsynchronouscachingvirtualization
Art of Distributed System Architecture Design
Written by

Art of Distributed System Architecture Design

Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.