How Alibaba’s Table Store Auto‑Solves Hotspot Issues with Real‑Time Load Balancing
This article explains the architecture and mechanisms of Alibaba Cloud's Table Store load‑balancing system, detailing how it collects metrics, detects user‑access and machine hotspots, and automatically applies actions such as partition moves, splits, merges, and isolation to maintain high availability and performance.
Background
Table Store (formerly OTS) is Alibaba's self‑developed multi‑tenant NoSQL distributed database. The presentation focuses on how its load‑balancing system addresses hotspot problems.
Table Store Architecture
The system is built on the Feitian kernel, providing distributed shared storage, lock services, and communication components. Above the kernel lies the engine composed of a few masters and many workers; masters manage worker states and schedule partitions. The front‑end provides unified HTTP services that forward requests to workers.
Load‑Balancing Context
Data is sharded by a partition key into partitions , the basic scheduling unit. Operations on partitions include move , split , merge , and group , all performed in seconds.
Hotspot Issues
Hotspots in a multi‑tenant NoSQL system fall into two categories:
User‑access hotspots: reasonable burst traffic (e.g., promotions) or unreasonable burst caused by poor table design.
Machine hotspots: sudden spikes in CPU or network usage that make a machine a bottleneck.
These problems are hard to locate and resolve due to incomplete statistics, limited mitigation mechanisms, and slow manual handling.
Load‑Balancing System Overview
The system consists of LBAgent (deployed with each worker) and LBMaster. LBAgent collects all local metrics, keeps recent data in memory, and asynchronously persists it to an external MetricStore. LBMaster aggregates data, provides an HTTP command layer (OpsServer), and hosts two analysis modules:
OnlineAnalyzer : real‑time analysis of in‑memory data to detect and act on hotspots.
OfflineAnalyzer : batch analysis of long‑term data from MetricStore to discover potential issues.
Both modules generate actions that are sent to workers or masters via the Executor. Results and actions are stored in ResultDataStore for further inspection.
Information Collection Module
The module aims for comprehensive metrics while minimally impacting the main request path. Every module records its own statistics, which are streamed through RPC layers and asynchronously pushed to a backend aggregation component, then pulled by LBAgent.
LBAgent Functions
Collect all local metrics.
Pre‑aggregate the data.
Persist metrics asynchronously to MetricStore.
LBMaster Core Functions
Gather cluster‑wide information.
Provide multi‑dimensional top‑N queries (error rate, latency, QPS, etc.).
Analyze data, generate actions, and execute them.
Self‑evaluate action effectiveness.
LBMaster also offers a white‑screen control panel for real‑time queries and manual operation commands, and supports dynamic, hot‑reloadable configuration.
Online Analysis Path
Online analysis processes short‑term data entirely in memory, ensuring sub‑second detection and remediation without relying on external systems. Example actions:
When a worker’s queue is full, the system detects a read‑write bottleneck on a partition and issues a split action, creating two new partitions and redistributing load.
If a partition causes a machine‑resource saturation, the system isolates the partition to a dedicated machine via a group action.
Offline Analysis Path
Offline analysis works on long‑term data stored in MetricStore, using external analytics to identify low‑utilization partitions for auto‑merge or to forecast future traffic peaks for proactive resource adjustment.
Effectiveness
Post‑deployment metrics show significant reductions in read/write error rates and latency, alongside increased throughput, especially for workloads that previously exhibited clear hotspots.
Key Takeaways
Comprehensive per‑module statistics are essential for accurate hotspot detection.
Automating manual mitigation resolves the majority of online issues without complex machine‑learning models.
Rich, configurable strategies allow rapid iteration and tailored responses to diverse workloads.
Q&A
Q1: What statistics are collected, and why worry about hotspots despite having worker queues?
A1: Some queues are shared across all partitions; a hotspot on one partition can monopolize resources, affecting all partitions on that machine.
Q2: Why prefer partition‑key based distribution over hash‑ring sharding?
A2: Hash sharding is hard to adjust dynamically; partition‑key based schemes allow easy operations like split to rebalance.
Q3: Does the solution rely on RabbitMQ, and were alternative approaches considered?
A3: If your system uses a queue service, its own load‑balancing capabilities may limit what can be done; the presented approach helps when the underlying system also suffers from hotspots.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
