Operations 15 min read

Scalable Architecture Best Practices: Lessons from eBay

This article outlines eBay's practical scalability best practices—including functional partitioning, horizontal sharding, avoiding distributed transactions, asynchronous design, virtualization, and judicious caching—to help large‑scale web systems achieve reliable, cost‑effective growth.

Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Scalable Architecture Best Practices: Lessons from eBay

At eBay, scalability is a daily architectural challenge; every design decision is evaluated for its impact on a system that serves billions of page views per day and stores data measured in petabytes.

In a scalable architecture, resource consumption should increase linearly (or better) with load, which can be measured by traffic or data volume. Scalability describes how resource usage changes as the number of work units grows, shaping the overall price‑performance curve.

Scalability touches transaction, operational, and development aspects. The following best practices capture the collective experience of eBay engineers and operators.

Best Practice #1: Partition by Function

Related functionality should be grouped together, unrelated parts should be isolated—whether called SOA, functional decomposition, or an engineering secret. Loose coupling between unrelated functions enables independent scaling.

At the code level we use JARs, packages, bundles, etc., to isolate functions.

At the application level eBay splits functions into separate application pools: sales, bidding, search, etc., distributing roughly 16,000 servers across 220 pools, allowing each pool to be scaled independently.

At the database level eBay maintains multiple logical databases (user data, product data, purchase data, etc.) across 400 physical hosts, enabling independent scaling of each data class.

Best Practice #2: Horizontal Sharding

Functional partitioning helps, but each function must still be broken into many small, manageable units. Horizontal sharding achieves this.

Because eBay designs interactions to be stateless, horizontal sharding is straightforward at the application layer—standard load balancers route traffic to identical, stateless servers, and adding capacity is as simple as provisioning more servers.

Database sharding is more challenging due to statefulness. eBay shards data by primary access paths (e.g., user data across 20 hosts, each holding 1/20 of users) and adds hosts as the user base grows. Different data sets use different sharding strategies such as modulo hashing, range partitioning, lookup tables, or hybrid approaches.

Best Practice #3: Avoid Distributed Transactions

Instead of costly two‑phase commits across resources, eBay relaxes cross‑system transactional guarantees. Following the CAP theorem, eBay prioritizes partition tolerance and availability, sacrificing immediate consistency where acceptable.

Most operations are auto‑commit; only carefully defined cases bundle statements into a single transaction. Eventual consistency is achieved through ordered DB operations, asynchronous recovery, reconciliation, or batch settlement, chosen per use‑case.

Architects must recognize that consistency is not a binary choice; it can be tailored to the needs of each operation.

Best Practice #4: Decouple with Asynchronous Strategies

Coupling synchronous components forces them to scale together and share failures. Asynchronous communication—via queues, multicast messages, batch jobs, etc.—allows each component to scale and fail independently.

The entire stack should embrace this principle, using techniques like SEDA (Staged Event‑Driven Architecture) to introduce asynchrony while keeping the programming model understandable.

Best Practice #5: Turn Processes into Asynchronous Flows

By moving non‑critical work to background processing, response latency for users is reduced. Activities such as tracking, invoicing, settlement, and reporting are good candidates for asynchronous handling.

Asynchrony also lowers infrastructure cost because capacity only needs to meet average load rather than peak load; queues smooth out spikes.

Best Practice #6: Virtualize All Layers

Adding abstraction layers—operating‑system virtualization, JVMs, ORM layers, virtual IPs, etc.—provides flexibility. eBay virtualizes databases, mapping logical databases to physical hosts, and abstracts routing logic that assigns records to partitions.

Search is similarly virtualized: a query aggregator runs parallel searches across partitions but presents a single logical index to users.

Virtualization enables operators to reallocate resources without touching application code, improving resilience and manageability.

Best Practice #7: Use Caching Judiciously

Effective caching depends on workload characteristics. Read‑heavy, rarely changing data (metadata, configuration, static content) are ideal for caching; eBay combines push and pull techniques to keep caches reasonably fresh.

Highly mutable or read‑write intensive data are harder to cache; eBay deliberately avoids caching session data or shared business objects to preserve availability and correctness.

Over‑caching can create new bottlenecks: if the cache fails, the entire system may become unavailable. Proper balance is essential.

Summary

Scalability is often mislabeled as a non‑functional requirement, but it is actually a prerequisite for functionality—its priority supersedes all other requirements.

These best practices aim to give you fresh perspectives on building scalable systems, regardless of their size.

(Article originally from InfoQ)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OperationsshardingAsynchronouscaching
Art of Distributed System Architecture Design
Written by

Art of Distributed System Architecture Design

Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.