eBay Scalability Best Practices: Functional Partitioning, Horizontal Sharding, Async Decoupling, and More
This article outlines eBay's key scalability best practices—including functional decomposition, horizontal sharding, avoiding distributed transactions, asynchronous decoupling, virtualization, and intelligent caching—to demonstrate how large‑scale web systems can achieve linear resource growth and high availability.
At eBay, scalability is a core architectural driver, required to support billions of users, over 2 billion daily page views, and petabytes of data.
Scalability means resource usage should increase linearly (or better) with load, shaping the price‑performance curve rather than fixing a single point on it.
Scalability touches transactional, operational, and development work; the following best practices are distilled from eBay’s collective experience.
Best Practice #1: Functional Partitioning
Group related functionality together and isolate unrelated parts, allowing each to scale independently. eBay organizes roughly 16,000 application servers into 220 pools (sales, bidding, search, etc.) and similarly separates databases by data type across 1,000 logical databases on 400 physical hosts.
Best Practice #2: Horizontal Partitioning (Sharding)
Beyond functional separation, split workloads into manageable units. The stateless application layer uses standard load balancers, enabling any server to handle traffic. Databases are sharded—e.g., user data across 20 hosts—allowing addition of hosts as data and traffic grow.
Best Practice #3: Avoid Distributed Transactions
Two‑phase commit is costly; eBay avoids client‑side distributed transactions, preferring eventual consistency techniques such as ordered DB operations, asynchronous event replay, coordination, or settlement batches, guided by CAP theorem trade‑offs.
Best Practice #4: Asynchronous Decoupling
Use asynchronous communication (queues, multicast, batch processes) so components can scale and remain available independently, breaking the single scalability bottleneck of synchronous calls.
Best Practice #5: Shift Processing to Asynchronous Flows
Move as much work as possible to background processing to reduce request latency, lower infrastructure cost, and handle peak loads by smoothing work over time.
Best Practice #6: Virtualize at All Levels
Leverage virtualization and abstraction—from OS and VM layers to ORM, load balancers, virtual IPs, and routing logic—to enable flexible rebalancing and migration without touching application code.
Best Practice #7: Proper Caching
Employ caching wisely, balancing hit rate, freshness, and memory constraints. Cache static metadata aggressively, avoid caching rapidly changing data unless justified, and ensure cache strategies align with availability and correctness requirements.
Conclusion
Scalability is often labeled a non‑functional requirement, but in reality it is a prerequisite for functionality—a "priority‑0" demand that must be addressed in any large‑scale system.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects Research Society
A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
