Scaling Vipshop’s Big Data Platform: Monitoring, Multi‑HDFS, Yarn Optimization & Capping
In 2017 Vipshop’s senior big‑data architect shares how the company grew its Hadoop‑based platform from zero to a thousand‑node cluster, detailing cluster health monitoring, multi‑HDFS deployment via Hive, Yarn container allocation improvements, and a hook‑driven Capping resource‑control system to boost stability and efficiency.
1. Cluster Monitoring
Vipshop’s offline data platform ingests raw data through Flume and Sqoop, processes it with Hive and Spark SQL, and serves ad‑hoc queries via Presto. All jobs run on a self‑developed data‑development platform called "数坊", which also provides self‑service analytics and data products such as price‑comparison and cube tools.
The Hadoop stack evolved from CDH 5.3.2 to CDH 5.13.x, comprising two clusters: a main 900‑node cluster and a 50‑node SSD cluster for real‑time‑offline fusion. Hive upgraded from 0.13.1 to 2.1.1, and Spark moved from 2.1.1 to 2.2. The platform now runs about 100 k daily jobs and more than 500 k Yarn applications.
Monitoring is built on three layers—machine, log, and service—requiring full coverage, real‑time alerts, and easy‑to‑configure rules. Historically the stack used Elasticsearch for log monitoring and Zabbix for machine metrics. In 2017 a new service‑layer solution was introduced: Prometheus pulls metrics from exporters, and Grafana visualises them on department‑wide dashboards. Prometheus integrates with existing Zabbix and ES data sources, and also monitors Kafka, Hadoop, Cassandra via JMX. Alerts are routed through email, SMS, and phone escalation.
Example: an HTTP server exposes Kafka lag offset as plain‑text metrics for Prometheus to scrape.
Grafana dashboards are built by adding a data source, selecting metrics, and configuring alerts via webhooks. The resulting big‑screen view lets operators see the health of the entire platform at a glance.
2. Hive on Multi‑HDFS
Rapid metadata growth (from 100 M to 200 M entries within a year) put pressure on the NameNode, prompting the need for multiple HDFS clusters. Existing community Federation solutions were rejected because they required extensive code changes (ViewFS incompatibility) and heavy client dependencies.
The adopted solution keeps a single Yarn cluster while adding two or more HDFS clusters. Hive acts as a transparent layer: tables can have locations on different HDFS clusters, and the Hive metastore presents a unified view to downstream engines (Presto, Kylin, etc.). An internal property internal.dataservices defines the default cluster, eliminating the need for mount‑table configuration.
Key benefits:
One Yarn cluster, multiple HDFS clusters.
Preserves the original HDFS scheme for maximum compatibility.
Hive‑level migration of specific databases/tables to new clusters.
Query engines read/write new‑cluster data through Hive metadata.
Lightweight federation without strong client dependencies.
After six months of deployment, users only need to omit the explicit HDFS cluster in the table location; the system automatically falls back to the default cluster.
3. Yarn Container Allocation Optimization
Before optimization, allocating a single container took ~0.8 ms. With ~70 k containers per batch, total allocation time approached one minute, becoming a bottleneck.
Vipshop’s Yarn uses the Fair Scheduler, which repeatedly sorts child queues and attempts allocation based on the largest resource deficit. Metrics showed that sorting and failed allocation attempts consumed roughly half of the total time.
Optimization strategy: skip the sorting step and allocate containers continuously, starting each new allocation from the index of the previous successful allocation. This heuristic reduces the chance of repeated failures and cuts allocation time nearly in half.
The result is almost a 2× increase in container allocation efficiency, allowing the platform to schedule jobs much faster under heavy load.
4. Hook‑Based Capping Resource Control
When the cluster is busy, two extremes can occur: either low‑priority jobs are rejected (favoring high‑priority work) or all jobs are accepted, risking overload. To balance this, Vipshop added hooks into Hive and Spark that check current system usage against per‑queue Capping thresholds before submitting a job.
The hook compares a global usage metric (e.g., root.usage) with the Capping value of the target queue. If usage exceeds the threshold, the job is rejected; otherwise it proceeds. A retry mechanism attempts submission up to six times before marking the job as failed.
Queue matching is performed automatically based on project, job characteristics, and a three‑level queue hierarchy. Additionally, a daily quota is enforced: if a team exceeds its quota, the Capping value for its queues is automatically lowered.
Example: two third‑level queues, root.bigdata_traffic.critical (Capping = 1) and root.bigdata_traffic.online (Capping = 0.9). When overall usage reaches 0.95, the critical queue can still submit jobs while the online queue is blocked until usage drops.
These mechanisms together provide fine‑grained, traffic‑light‑style control over resource consumption, improving platform stability during peak periods.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
