How to Ensure Double‑11 Supply‑Chain Dashboard Stability: End‑to‑End Strategies
This article details the end‑to‑end technical and operational measures—including full‑chain flow mapping, risk point analysis, layered mitigation tactics, monitoring, and team coordination—used to guarantee the stability and accuracy of the supply‑chain dashboard during the Double‑11 promotion.
1. Full Chain Flow Diagram
First step is to draw the full chain flow diagram of the supply‑chain dashboard. After obtaining an overview, we drill into each metric processing detail to identify issues and then devise targeted assurance solutions.
Below is the complete process diagram, with risk points and corresponding mitigation measures marked.
2. Risk Point Identification
With the full chain in hand, we analyze weak points from left to right, top to bottom. The dashboard processing follows a layered concept: data access layer, metric processing layer, metric storage layer, metric service layer.
2.1 Data Access Layer
Long processing chain: from data generation to metric completion, passing through report DB, multi‑layer hive tasks, JDQ, Flink tasks, DTS, ClickHouse, Easydata.
Many dependent parties: large‑item, service, reverse, B2B, LDC, cold‑chain, real‑time data team.
Various access types: offline Hive, JSF interface, HTTP interface, business import, ClickHouse.
2.2 Metric Processing Layer
Metrics have multiple dimensions and must be calculated in a specific order.
External dependencies cannot guarantee 100 % availability; processing needs recomputation, fault tolerance, and automatic recovery.
Promotional strategies must be flexible (influenced by competitors, need real‑time adjustments).
2.3 Metric Storage Layer
Inter‑business impact.
Rapid identification of metric anomalies.
2.4 Metric Service Layer
Ensure interface stability.
External interfaces must support degradation.
External interfaces need fallback mechanisms.
Isolation of multiple business services.
2.5 Monitoring Management
How to monitor metric processing.
How to quickly locate chain anomalies.
3. Technical Assurance Strategies
3.1 Data Access Layer
3.1.1 Long Processing Chain
Define clear boundaries for the full chain and split responsibilities into four zones:
Hive tasks – guaranteed by the Hive team.
Real‑time processing – guaranteed by the real‑time data team.
Various interface providers – guaranteed by each provider.
Dashboard processing and query – guaranteed by SCM (our own team).
3.1.2 Require high availability from all dependent parties and interfaces.
Hive: high availability, alerts, monitoring, stress testing.
Real‑time tasks: dual‑stream processing, Flink stress testing.
Interface providers: define SLA and coverage.
3.1.3 Pre‑emptive monitoring – Hive and real‑time Flink tasks are monitored by SCM R&D, allowing downstream data backlog awareness and supplementing upstream monitoring.
3.1.4 Multiple Dependency Parties
Identify all dependent interfaces and agree on SLAs; examples are shown below.
3.1.5 Various Access Types
Apply specific measures per type:
Offline Hive: define major‑event protection tasks, monitor offline push tables.
Business import: assist business with data validation and simulate Double‑11 import in pre‑release environment.
External JSF/HTTP interfaces: monitoring, retry, degradation, fallback.
3.2 Metric Processing Layer
3.2.1 Multi‑Dimensional Metrics with Ordered Calculation
Metrics are partitioned by dimension (single‑warehouse, regional) and by function (minute, hour, cache, historical tables).
3.2.2 Recalculation, Fault Tolerance, and Rapid Recovery
External dependencies are not 100 % reliable; processing must support recomputation, fault tolerance, and fast recovery.
Fault tolerance: generic degradation for each external interface, using the most recent successful result within the last 30 minutes.
Rapid recovery: design fault‑tolerance plans so that when one or more external interfaces fail, metrics can be quickly recomputed and restored.
3.2.3 Flexible Promotional Strategy Configuration
Abstract business strategies and adjust them flexibly via DUCC configuration.
{
"sTime": "2024-11-xx 00:00:00", // start time
"eTime": "2024-11-xx 19:59:59", // end time
"tbSTime": "2023-11-xx 00:00:00", // year‑over‑year start
"tbETime": "2023-11-xx 19:59:59", // year‑over‑year end
"hbSTime": "2024-06-xx 00:00:00", // month‑over‑month start
"hbETime": "2024-06-xx 19:59:59", // month‑over‑month end
"showType": "24h", // type, 24h or 20h
"special24hCompDateStr": "2024-11-xx", // special comparison date for 24h
"specialCompDateStr": "" // comparison days for 4h/28h
}Metric caching uses Redis; interface stress testing uses force‑bot based on estimated load; dual‑stream processing enables quick switch to standby ClickHouse when alerts occur.
3.3 Metric Storage Layer
3.3.1 Multi‑Business Impact
MySQL is deployed with one primary and three replicas, partitioned into main DB, dashboard query, core board query, and other reports.
Dashboard and core reports use MySQL; EasyBI reports use Doris, with asynchronous MySQL binlog replication.
3.3.2 Rapid Anomaly Localization
Marking fields are stored as JSON, enabling SQL filtering.
MySQL data is asynchronously stored in Doris for longer‑term retention.
3.4 Metric Service Layer
3.4.1 Interface Stability
Stress testing.
Isolation of underlying storage.
Business isolation.
3.4.2 Degradation Capability
When an external interface fails, use the most recent successful data from the last 30 minutes within the current promotional strategy.
3.4.3 Fallback Mechanism
Provide fallback strategies for categories with prediction anomalies to ensure data remains consistent.
3.5 Monitoring Management
Two key points:
Pre‑emptive monitoring: detect upstream anomalies early and take preventive actions.
Comprehensive coverage: monitor all stages—processing, querying, data pushing, and data accuracy.
3.5.1 Hive Alerts
Added SCM R&D to Hive alerts for pre‑emptive monitoring.
3.5.2 Real‑time Processing Alerts
Added SCM R&D to JDQ real‑time alerts.
Maintained a JDQ hierarchy list to quickly locate bottlenecks. Diagram 1 shows the real‑time processing chain with upstream arrows; Diagram 2 lists quick‑access monitoring URLs for each upstream table.
3.5.3 SCM Processing Monitoring Dashboard
Shows interface availability, method availability, and call status.
3.5.4 SCM Interface Query Monitoring Dashboard
3.5.5 SCM Metric Data Accuracy Monitoring
Uses fire‑watch to monitor duplicate writes, data volume reasonableness, and dependency health.
4. Additional Process Assurance Strategies
4.1 Communication and Coordination
Established a dedicated communication group for dashboard promotion assurance to enable fast, efficient coordination among many dependent parties.
4.2 Full‑Chain Drills with Dependent Parties
Bi‑annual pre‑promotion drills ensure all system owners are familiar with configurations and verify special‑strategy settings before the actual event.
4.3 Business Collaboration and Pre‑Import Simulation
4.3.1 Historical Same/Month‑over‑Month Data Validation
Work with business to confirm data accuracy across dimensions.
4.3.2 Strategy Configuration Double‑Check
Implement a double‑check SOP for strategy settings.
4.4 Result‑First Philosophy
Our goal is stable and accurate Double‑11 dashboards; we proactively drive data import and resolution rather than reacting to issues after they arise.
4.5 Team Strength
Dashboard stability is achieved through collective effort across teams and upstream/downstream collaborators.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
