Operations 15 min read

How to Ensure Double‑11 Supply‑Chain Dashboard Stability: End‑to‑End Strategies

This article details the end‑to‑end technical and operational measures—including full‑chain flow mapping, risk point analysis, layered mitigation tactics, monitoring, and team coordination—used to guarantee the stability and accuracy of the supply‑chain dashboard during the Double‑11 promotion.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
How to Ensure Double‑11 Supply‑Chain Dashboard Stability: End‑to‑End Strategies

1. Full Chain Flow Diagram

First step is to draw the full chain flow diagram of the supply‑chain dashboard. After obtaining an overview, we drill into each metric processing detail to identify issues and then devise targeted assurance solutions.

Below is the complete process diagram, with risk points and corresponding mitigation measures marked.

2. Risk Point Identification

With the full chain in hand, we analyze weak points from left to right, top to bottom. The dashboard processing follows a layered concept: data access layer, metric processing layer, metric storage layer, metric service layer.

2.1 Data Access Layer

Long processing chain: from data generation to metric completion, passing through report DB, multi‑layer hive tasks, JDQ, Flink tasks, DTS, ClickHouse, Easydata.

Many dependent parties: large‑item, service, reverse, B2B, LDC, cold‑chain, real‑time data team.

Various access types: offline Hive, JSF interface, HTTP interface, business import, ClickHouse.

2.2 Metric Processing Layer

Metrics have multiple dimensions and must be calculated in a specific order.

External dependencies cannot guarantee 100 % availability; processing needs recomputation, fault tolerance, and automatic recovery.

Promotional strategies must be flexible (influenced by competitors, need real‑time adjustments).

2.3 Metric Storage Layer

Inter‑business impact.

Rapid identification of metric anomalies.

2.4 Metric Service Layer

Ensure interface stability.

External interfaces must support degradation.

External interfaces need fallback mechanisms.

Isolation of multiple business services.

2.5 Monitoring Management

How to monitor metric processing.

How to quickly locate chain anomalies.

3. Technical Assurance Strategies

3.1 Data Access Layer

3.1.1 Long Processing Chain

Define clear boundaries for the full chain and split responsibilities into four zones:

Hive tasks – guaranteed by the Hive team.

Real‑time processing – guaranteed by the real‑time data team.

Various interface providers – guaranteed by each provider.

Dashboard processing and query – guaranteed by SCM (our own team).

3.1.2 Require high availability from all dependent parties and interfaces.

Hive: high availability, alerts, monitoring, stress testing.

Real‑time tasks: dual‑stream processing, Flink stress testing.

Interface providers: define SLA and coverage.

3.1.3 Pre‑emptive monitoring – Hive and real‑time Flink tasks are monitored by SCM R&D, allowing downstream data backlog awareness and supplementing upstream monitoring.

3.1.4 Multiple Dependency Parties

Identify all dependent interfaces and agree on SLAs; examples are shown below.

3.1.5 Various Access Types

Apply specific measures per type:

Offline Hive: define major‑event protection tasks, monitor offline push tables.

Business import: assist business with data validation and simulate Double‑11 import in pre‑release environment.

External JSF/HTTP interfaces: monitoring, retry, degradation, fallback.

3.2 Metric Processing Layer

3.2.1 Multi‑Dimensional Metrics with Ordered Calculation

Metrics are partitioned by dimension (single‑warehouse, regional) and by function (minute, hour, cache, historical tables).

3.2.2 Recalculation, Fault Tolerance, and Rapid Recovery

External dependencies are not 100 % reliable; processing must support recomputation, fault tolerance, and fast recovery.

Fault tolerance: generic degradation for each external interface, using the most recent successful result within the last 30 minutes.

Rapid recovery: design fault‑tolerance plans so that when one or more external interfaces fail, metrics can be quickly recomputed and restored.

3.2.3 Flexible Promotional Strategy Configuration

Abstract business strategies and adjust them flexibly via DUCC configuration.

{
    "sTime": "2024-11-xx 00:00:00", // start time
    "eTime": "2024-11-xx 19:59:59", // end time
    "tbSTime": "2023-11-xx 00:00:00", // year‑over‑year start
    "tbETime": "2023-11-xx 19:59:59", // year‑over‑year end
    "hbSTime": "2024-06-xx 00:00:00", // month‑over‑month start
    "hbETime": "2024-06-xx 19:59:59", // month‑over‑month end
    "showType": "24h", // type, 24h or 20h
    "special24hCompDateStr": "2024-11-xx", // special comparison date for 24h
    "specialCompDateStr": "" // comparison days for 4h/28h
}

Metric caching uses Redis; interface stress testing uses force‑bot based on estimated load; dual‑stream processing enables quick switch to standby ClickHouse when alerts occur.

3.3 Metric Storage Layer

3.3.1 Multi‑Business Impact

MySQL is deployed with one primary and three replicas, partitioned into main DB, dashboard query, core board query, and other reports.

Dashboard and core reports use MySQL; EasyBI reports use Doris, with asynchronous MySQL binlog replication.

3.3.2 Rapid Anomaly Localization

Marking fields are stored as JSON, enabling SQL filtering.

MySQL data is asynchronously stored in Doris for longer‑term retention.

3.4 Metric Service Layer

3.4.1 Interface Stability

Stress testing.

Isolation of underlying storage.

Business isolation.

3.4.2 Degradation Capability

When an external interface fails, use the most recent successful data from the last 30 minutes within the current promotional strategy.

3.4.3 Fallback Mechanism

Provide fallback strategies for categories with prediction anomalies to ensure data remains consistent.

3.5 Monitoring Management

Two key points:

Pre‑emptive monitoring: detect upstream anomalies early and take preventive actions.

Comprehensive coverage: monitor all stages—processing, querying, data pushing, and data accuracy.

3.5.1 Hive Alerts

Added SCM R&D to Hive alerts for pre‑emptive monitoring.

3.5.2 Real‑time Processing Alerts

Added SCM R&D to JDQ real‑time alerts.

Maintained a JDQ hierarchy list to quickly locate bottlenecks. Diagram 1 shows the real‑time processing chain with upstream arrows; Diagram 2 lists quick‑access monitoring URLs for each upstream table.

3.5.3 SCM Processing Monitoring Dashboard

Shows interface availability, method availability, and call status.

3.5.4 SCM Interface Query Monitoring Dashboard

3.5.5 SCM Metric Data Accuracy Monitoring

Uses fire‑watch to monitor duplicate writes, data volume reasonableness, and dependency health.

4. Additional Process Assurance Strategies

4.1 Communication and Coordination

Established a dedicated communication group for dashboard promotion assurance to enable fast, efficient coordination among many dependent parties.

4.2 Full‑Chain Drills with Dependent Parties

Bi‑annual pre‑promotion drills ensure all system owners are familiar with configurations and verify special‑strategy settings before the actual event.

4.3 Business Collaboration and Pre‑Import Simulation

4.3.1 Historical Same/Month‑over‑Month Data Validation

Work with business to confirm data accuracy across dimensions.

4.3.2 Strategy Configuration Double‑Check

Implement a double‑check SOP for strategy settings.

4.4 Result‑First Philosophy

Our goal is stable and accurate Double‑11 dashboards; we proactively drive data import and resolution rather than reacting to issues after they arise.

4.5 Team Strength

Dashboard stability is achieved through collective effort across teams and upstream/downstream collaborators.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringBig DataOperationsSupply ChainDashboardstability
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.