Designing Cost‑Effective Disaster Recovery Data Backup for LBS‑Based SOA Services
This article details a comprehensive disaster‑recovery strategy for LBS‑driven SOA services, covering challenges of massive POI data backup, cost‑reduction via grid indexing (H3), selective caching, compression, diff validation, client‑side fallback, and deployment processes to achieve reliable, low‑cost data availability.
Background
In service‑oriented architecture (SOA) systems, disaster‑recovery capability is crucial for stability. Multi‑data‑center deployment, automated failover, and data backup improve resilience, but LBS (location‑based service) scenarios introduce new difficulties due to massive POI data and fine‑grained resource zoning.
Problem Statement
With the growth of the “秒送” business, traffic for LBS‑based user transactions has surged. The system must distinguish strong‑real‑time and weak‑real‑time data, understand the data production chain, and ensure backup authenticity while maintaining user experience.
Pain Points / Challenges
How to cache POI latitude‑longitude data at national scale (hundreds of billions of points).
How to reduce the enormous storage cost of disaster‑recovery data (initially >5 million RMB/month).
How to guarantee the effectiveness of cached resources while handling consistency and load pressure.
Industry Research
Investigation of a peer’s architecture revealed no explicit data‑backup at the SOA layer; they rely on lower‑level data redundancy, leaving the backup solution unclear.
Solution Ideation
We identified critical entry points (home page, channel page, store detail page) that block transaction flow and require backup. For home and channel pages, POI‑driven recommendation results are complex; we therefore propose using a grid‑based approach to reduce data volume.
Grid Construction
Adopt H3 hexagonal grid indexing (precision 7 ≈ 1.4 km) to approximate GIS coverage. This reduces POI count from billions to hundreds of thousands, cutting storage cost by >99%.
Cost comparison: precision 7 costs ≈ 2,973 RMB/month, precision 8 ≈ 14,133 RMB/month, versus the original >5 million RMB/month.
Hotspot POI Selection
Select the POI with the highest user request density within each grid as the backup representative.
Backup Frequency
Cache two data sets (daytime and nighttime) but prioritize daytime data due to lower night‑time traffic.
Store Detail Page Strategy
Cache store classification data (few hundred per store) and top‑two‑page product listings, excluding closed stores, reducing cost to ≈ 900 RMB/month.
Compression
Apply GZIP compression to all stringified data, achieving ~60% space savings.
Cost Summary
Home
Channel
Store Detail
Pre‑compression
1087 GB
3624 GB
300 GB
Post‑compression
652 GB
2174 GB
180 GB
Final Cost
1,956 RMB/month
6,522 RMB/month
540 RMB/month
Total monthly cost drops from >5 million RMB to ≈ 9,018 RMB.
Diff Validation
Validate backup accuracy by comparing seven POIs (six vertices + center) per hexagon against online results across 39 cities; aim for ≥90% consistency.
Implementation Process
Divide the solution into five modules: client interaction, grid service, task orchestration, gray‑release, and disaster‑recovery switch.
Client caches data in localStorage (~5 MB) for quick fallback; server decides when to use client cache versus Redis backup based on cache freshness.
Grid module uses JMF component to generate precision 7 and 8 grids and stores hotspot POI coordinates in MySQL for workers.
Task module handles asynchronous data generation, retries, alerts, and monitoring.
Gray‑release includes machine‑level, PIN‑level, store‑level, and city‑level rollouts.
Switching logic: if client cache is fresh, use it; otherwise fetch from Redis backup; if both missing, trigger fallback.
Results
Demonstrated successful home/channel page and store detail page rendering using the backup data, achieving the targeted cost reduction and reliability.
Reflection
Beyond data backup, future work will explore fault‑tolerance mechanisms such as automatic failover, multi‑active deployment, and comprehensive BCP/DRP practices.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.