Big Data 11 min read

Ensuring Stability of the Double‑11 Supply Chain Dashboard: Full‑Chain Process, Risk Points, and Technical Safeguard Strategies

This article details how the supply‑chain big‑screen dashboard for Double‑11 maintains high stability by mapping the full data‑flow, identifying risk points across ingestion, processing, storage and service layers, and applying comprehensive technical safeguards such as high‑availability design, fault‑tolerance, monitoring, and coordinated operational procedures.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Ensuring Stability of the Double‑11 Supply Chain Dashboard: Full‑Chain Process, Risk Points, and Technical Safeguard Strategies

Background

The supply‑chain dashboard is a core logistics report used for major promotions, featuring over 170 metrics, more than 30 dependent interfaces, a long data chain, and strict stability requirements.

1. Full‑Chain Process Diagram

The first step is to draw a complete flow diagram, then drill down into each metric processing detail to discover issues and devise targeted safeguards.

2. Risk Point Identification

The dashboard’s pipeline is divided into four layers: data ingestion, metric processing, metric storage, and metric service. Key risks per layer include:

Data Ingestion Layer : long processing chain (Hive, JSF, HTTP, JDQ, Flink, DTS, CK, EasyData), many dependent parties, multiple ingestion types.

Metric Processing Layer : multi‑dimensional metrics with ordered calculations, external dependencies requiring recomputation, flexible promotion‑strategy adjustments.

Metric Storage Layer : cross‑business impact and need for rapid anomaly localization.

Metric Service Layer : interface stability, degradable and fallback mechanisms, business isolation.

Monitoring Management : monitoring of metric processing and rapid fault localization.

3. Technical Safeguard Strategies

3.1 Data Ingestion Layer

3.1.1 Long Processing Chain

Define clear boundaries and assign responsibility to four zones: Hive team, real‑time processing team, interface providers, and SCM team.

Ensure high availability for each dependent component (Hive, Flink, interfaces) and add pre‑emptive monitoring.

3.1.2 Multiple Dependent Parties

Document all dependent interfaces and negotiate SLAs; sample interface matrix is shown in the original diagram.

3.1.3 Multiple Ingestion Types

Offline Hive: dedicated promotion‑heavy tasks, monitoring, and fire‑watch tables.

Business Import: validation and mock data import for Double‑11.

External JSF/HTTP: monitoring, retry, degradation, and fallback.

3.2 Metric Processing Layer

3.2.1 Multi‑Dimensional Metrics

Separate tables by dimension (warehouse, region) and by granularity (minute, hour, cache, history).

3.2.2 Re‑computation, Fault Tolerance, Fast Recovery

Implement generic degradation for external interfaces, fallback to the latest successful result within 30 minutes, and design fast recomputation paths.

3.2.3 Flexible Promotion Strategy

Expose strategy configuration via DUCC; example JSON configuration:

{
    "sTime": "2024-11-xx 00:00:00",
    "eTime": "2024-11-xx 19:59:59",
    "tbSTime": "2023-11-xx 00:00:00",
    "tbETime": "2023-11-xx 19:59:59",
    "hbSTime": "2024-06-xx 00:00:00",
    "hbETime": "2024-06-xx 19:59:59",
    "showType": "24h",
    "special24hCompDateStr": "2024-11-xx",
    "specialCompDateStr": ""
}

3.3 Metric Storage Layer

MySQL is deployed in a primary‑three‑replica topology, separating databases for main screen, core board, and other reports. Doris receives asynchronous binlog replication for long‑term storage.

Metrics are stored with JSON tagging to enable fast filtering; SQL queries extract needed fields directly from JSON.

3.4 Metric Service Layer

Interface stability ensured through load testing and isolation.

Degradation: on single‑interface failure, fallback to the most recent successful data within 30 minutes.

Fallback strategies for abnormal categories during prediction.

3.5 Monitoring Management

Two principles: pre‑emptive monitoring to detect upstream issues early, and comprehensive coverage across processing, querying, data pushing, and accuracy checks. Dashboards display interface availability, internal method health, and data correctness.

4. Additional Process Safeguards

4.1 Communication & Collaboration

Establish a dedicated promotion‑support chat group to streamline coordination among many stakeholders.

4.2 Full‑Chain Rehearsals

Conduct bi‑annual end‑to‑end drills to familiarize teams with configurations and validate special‑promotion strategies.

4.3 Business‑Linkage & Pre‑Validation

Collaborate with business to verify historical same‑period and环比 data, and mock promotion dates in pre‑release environments to ensure data accuracy.

4.4 Result‑First Mindset

Prioritize dashboard stability and data correctness over blame‑shifting, driving proactive issue resolution.

4.5 Team Effort

Success relies on collective effort across development, operations, and upstream partners.

MonitoringBig DataData Pipelinesupply chainDashboardstability
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.