Big Data 15 min read

Scaling Vipshop’s Big Data Platform: Monitoring, Multi‑HDFS, Yarn Optimization & Capping

In 2017 Vipshop’s senior big‑data architect shares how the company grew its Hadoop‑based platform from zero to a thousand‑node cluster, detailing cluster health monitoring, multi‑HDFS deployment via Hive, Yarn container allocation improvements, and a hook‑driven Capping resource‑control system to boost stability and efficiency.

dbaplus Community
dbaplus Community
dbaplus Community
Scaling Vipshop’s Big Data Platform: Monitoring, Multi‑HDFS, Yarn Optimization & Capping

1. Cluster Monitoring

Vipshop’s offline data platform ingests raw data through Flume and Sqoop, processes it with Hive and Spark SQL, and serves ad‑hoc queries via Presto. All jobs run on a self‑developed data‑development platform called "数坊", which also provides self‑service analytics and data products such as price‑comparison and cube tools.

The Hadoop stack evolved from CDH 5.3.2 to CDH 5.13.x, comprising two clusters: a main 900‑node cluster and a 50‑node SSD cluster for real‑time‑offline fusion. Hive upgraded from 0.13.1 to 2.1.1, and Spark moved from 2.1.1 to 2.2. The platform now runs about 100 k daily jobs and more than 500 k Yarn applications.

Monitoring is built on three layers—machine, log, and service—requiring full coverage, real‑time alerts, and easy‑to‑configure rules. Historically the stack used Elasticsearch for log monitoring and Zabbix for machine metrics. In 2017 a new service‑layer solution was introduced: Prometheus pulls metrics from exporters, and Grafana visualises them on department‑wide dashboards. Prometheus integrates with existing Zabbix and ES data sources, and also monitors Kafka, Hadoop, Cassandra via JMX. Alerts are routed through email, SMS, and phone escalation.

Example: an HTTP server exposes Kafka lag offset as plain‑text metrics for Prometheus to scrape.

Prometheus HTTP metrics server
Prometheus HTTP metrics server

Grafana dashboards are built by adding a data source, selecting metrics, and configuring alerts via webhooks. The resulting big‑screen view lets operators see the health of the entire platform at a glance.

Grafana department dashboard
Grafana department dashboard

2. Hive on Multi‑HDFS

Rapid metadata growth (from 100 M to 200 M entries within a year) put pressure on the NameNode, prompting the need for multiple HDFS clusters. Existing community Federation solutions were rejected because they required extensive code changes (ViewFS incompatibility) and heavy client dependencies.

The adopted solution keeps a single Yarn cluster while adding two or more HDFS clusters. Hive acts as a transparent layer: tables can have locations on different HDFS clusters, and the Hive metastore presents a unified view to downstream engines (Presto, Kylin, etc.). An internal property internal.dataservices defines the default cluster, eliminating the need for mount‑table configuration.

Multi‑HDFS architecture diagram
Multi‑HDFS architecture diagram

Key benefits:

One Yarn cluster, multiple HDFS clusters.

Preserves the original HDFS scheme for maximum compatibility.

Hive‑level migration of specific databases/tables to new clusters.

Query engines read/write new‑cluster data through Hive metadata.

Lightweight federation without strong client dependencies.

After six months of deployment, users only need to omit the explicit HDFS cluster in the table location; the system automatically falls back to the default cluster.

3. Yarn Container Allocation Optimization

Before optimization, allocating a single container took ~0.8 ms. With ~70 k containers per batch, total allocation time approached one minute, becoming a bottleneck.

Vipshop’s Yarn uses the Fair Scheduler, which repeatedly sorts child queues and attempts allocation based on the largest resource deficit. Metrics showed that sorting and failed allocation attempts consumed roughly half of the total time.

Optimization strategy: skip the sorting step and allocate containers continuously, starting each new allocation from the index of the previous successful allocation. This heuristic reduces the chance of repeated failures and cuts allocation time nearly in half.

Container allocation optimization flow
Container allocation optimization flow

The result is almost a 2× increase in container allocation efficiency, allowing the platform to schedule jobs much faster under heavy load.

Allocation performance before and after
Allocation performance before and after

4. Hook‑Based Capping Resource Control

When the cluster is busy, two extremes can occur: either low‑priority jobs are rejected (favoring high‑priority work) or all jobs are accepted, risking overload. To balance this, Vipshop added hooks into Hive and Spark that check current system usage against per‑queue Capping thresholds before submitting a job.

The hook compares a global usage metric (e.g., root.usage) with the Capping value of the target queue. If usage exceeds the threshold, the job is rejected; otherwise it proceeds. A retry mechanism attempts submission up to six times before marking the job as failed.

Queue matching is performed automatically based on project, job characteristics, and a three‑level queue hierarchy. Additionally, a daily quota is enforced: if a team exceeds its quota, the Capping value for its queues is automatically lowered.

Capping control module diagram
Capping control module diagram

Example: two third‑level queues, root.bigdata_traffic.critical (Capping = 1) and root.bigdata_traffic.online (Capping = 0.9). When overall usage reaches 0.95, the critical queue can still submit jobs while the online queue is blocked until usage drops.

Quota‑based Capping example
Quota‑based Capping example

These mechanisms together provide fine‑grained, traffic‑light‑style control over resource consumption, improving platform stability during peak periods.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringBig DataHDFScapping
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.