Operations 14 min read

eBay Scalability Best Practices: Functional Partitioning, Horizontal Sharding, Asynchronous Decoupling, and More

The article outlines eBay's key scalability best practices—including functional partitioning, horizontal sharding, avoiding distributed transactions, aggressive asynchronous decoupling, moving work to async pipelines, pervasive virtualization, and intelligent caching—to achieve linear or better resource usage as load grows.

Architects Research Society

Nov 25, 2018

eBay Scalability Best Practices: Functional Partitioning, Horizontal Sharding, Asynchronous Decoupling, and More

At eBay, scalability is a primary architectural driver because the platform serves billions of page views daily and stores petabytes of data, making linear resource growth with load a necessity rather than a choice.

Scalability means that resource usage should increase proportionally (or better) with load, reflecting the shape of the price‑performance curve rather than a single point on it.

Best Practice #1: Partition by Function

Group related features into separate services or application pools so they can be scaled independently; unrelated features are isolated. eBay runs about 16,000 servers organized into 220 pools (e.g., sales, bidding, search), allowing independent scaling and resource isolation.

At the database layer, eBay uses multiple logical databases (e.g., user data, item data, purchase data) spread across 400 physical hosts, enabling independent scaling per data type.

Best Practice #2: Horizontal Sharding

Within each functional area, break work into manageable units that can be scaled independently. Application servers are stateless and load‑balanced, so adding more servers increases capacity.

Databases are sharded by primary access paths (e.g., user data split across 20 hosts, each holding 1/20 of users). Similar schemes are applied to items, purchases, accounts, etc., allowing the infrastructure to support re‑partitioning as data grows.

Best Practice #3: Avoid Distributed Transactions

Instead of two‑phase commit, eBay relaxes transaction guarantees across unrelated systems, favoring availability and partition tolerance per the CAP theorem. Most operations are auto‑commit; when needed, multiple statements on a single DB are combined into a single transaction.

Techniques such as careful ordering of DB operations, asynchronous event replay, and coordinated batch processing are used to achieve eventual consistency where required.

Best Practice #4: Aggressive Asynchronous Decoupling

Components interact asynchronously (via queues, multicast, batch jobs, etc.) so that each can scale and fail independently, avoiding the single‑point scalability and availability constraints of synchronous calls.

Frameworks like SEDA (Staged Event‑Driven Architecture) enable internal asynchronous processing while keeping the programming model understandable.

Best Practice #5: Move Work to Asynchronous Pipelines

Shift as much processing as possible to background asynchronous flows (e.g., activity tracking, billing, settlement, reporting) to reduce request‑side latency and allow infrastructure to scale based on average load rather than peak spikes.

Best Practice #6: Virtualize at All Levels

Use abstraction layers—virtual machines, ORM layers, load balancers, virtual IPs—to separate logical services from physical resources, enabling flexible re‑balancing, migration, and fault tolerance without code changes.

eBay virtualizes databases and search engines, mapping logical representations to physical machines, which simplifies operational changes and supports scalable growth.

Best Practice #7: Cache Wisely

Effective caching maximizes hit rates within memory limits while balancing freshness and availability. Static metadata and configuration are aggressively cached, whereas rapidly changing data is often left uncached to avoid consistency challenges.

Proper caching can bend the scaling curve below linear, but over‑reliance on cache can create a single point of failure; cache strategies must be tailored to the specific workload.

Summary

Scalability is a non‑functional requirement that is, in fact, a prerequisite for functionality—a "priority‑0" demand. The described best practices aim to help engineers design systems that scale predictably, remain available, and can be operated efficiently at any size.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations Sharding asynchronous Caching

Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.