Business Reconciliation Platform Architecture Design for Distributed Systems
The article describes YouZan's business reconciliation platform for distributed systems, which detects and quantifies data inconsistencies by offering easy plug‑in integration, a four‑step orchestrated workflow, high‑throughput offline processing with Spark, second‑level real‑time event handling, a three‑layer architecture, and health monitoring for transaction chains.
According to the CAP theorem, distributed systems cannot guarantee consistency (C) while maintaining availability (A) and partition tolerance (P). Since network call failures are inevitable, systems inevitably experience inconsistent states. This article introduces YouZan's business reconciliation platform designed to detect and quantify data inconsistencies in distributed environments.
Background: In transaction scenarios, various inconsistency issues can occur: orders showing "pending payment" after successful payment, orders showing "pending shipment" after logistics pickup, orders showing "refund in progress" after bank refund completion, and orders failing to auto-complete after 7 days. The core purpose of the reconciliation platform is to detect such issues proactively before user complaints arise.
Challenges: The platform must meet three core requirements: easy business system integration, handling massive data volumes with real-time performance.
Architecture - Easy Integration: All reconciliation processes can be decomposed into four steps: Data Loading, Transformation/Parsing, Comparison, and Result Processing. Each step must be orchestratable with configurable components. The reconciliation engine provides: workflow orchestration, rule engine capabilities, and plugin-based integration. Key interfaces include ResourceLoader (for DB, FILE, RPC, REST data sources), Parser (data modeling with Groovy scripting), Checker (field comparison with findFirst/full strategies), and ResultHandler (persistence, alerts, data repair).
Architecture - High Throughput: For offline scheduled reconciliation with millions of records, the platform uses distributed task splitting and big data tools. Two modes: conventional mode using data platform with sharding and pagination, and Spark mode for 10M+ records using Hive and NSQ message queue.
Architecture - High Real-time Performance: For second-level reconciliation triggered by business events, the platform uses EventPool for high-concurrency buffering, rate limiting, sampling, routing, and pipeline processing. A blocking queue prevents thread resource exhaustion, and delayed blocking queues enable delayed reconciliation.
Overall Design: Three-layer architecture: scheduling layer (task triggering, splitting, scheduling), reconciliation engine (orchestrated workflow), and infrastructure layer (rule engine, process engine, generic invocation, monitoring).
Health Monitoring: The platform provides health feedback for business systems and entire transaction chains based on data consistency information.
Collaboration: Through open API jars and SPI mechanisms, business teams can implement custom plugins for the reconciliation platform.
Youzan Coder
Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.