Operations 8 min read

How to Build Real‑Time and Offline Data Reconciliation for System Consistency

This article explains why cross‑system data inconsistencies occur, defines key reconciliation metrics—completeness, timeliness, and automatic repair—and provides step‑by‑step designs for both real‑time (seconds‑to‑minutes) and offline (hour‑to‑day) reconciliation, including message‑driven triggers, batch processing, and SQL examples for detecting and fixing mismatches.

Java Baker
Java Baker
Java Baker
How to Build Real‑Time and Offline Data Reconciliation for System Consistency

Background

In cross‑system data write scenarios, upstream and downstream systems may become inconsistent due to network timeouts, jitter, or the inability to simultaneously succeed in writing to a local DB and calling an external interface, so timely detection and automatic repair are required. The following summarizes reconciliation based on the author's experience.

Note: This reconciliation is not limited to financial reconciliation; it refers to field‑level reconciliation between B‑side and C‑side systems.

Reconciliation Metrics

Key indicators for good reconciliation are:

Completeness : Ensure every field is reconciled.

Timeliness : Higher is better; seconds > minutes > hours > days.

Automatic Repair : After detecting inconsistencies, automatically fix them, then reconcile again to confirm final consistency, forming a closed loop.

Both real‑time and offline reconciliation are recommended.

1. Real‑time Reconciliation (seconds‑to‑minutes)

Real‑time reconciliation quickly discovers inconsistencies; typically the data writer initiates reconciliation, and the data receiver provides a query interface (e.g., a replica DB).

Trigger methods:

(Not recommended) Database change events: listen to binlog changes of the main business table.

(Recommended) Business message events: listen to business message changes.

Message‑driven triggers are preferred because database change events have limitations:

If a write operation does not update the main table (e.g., only updates an extension table), you need to consume binlog events of the extension table.

If there are intermediate states, you must wait until the record reaches its final state before reconciling, requiring filtering many invalid messages.

Implementation steps:

Listen to business message changes. Business messages must be transactional: they are emitted only when the business operation succeeds. Messages can carry information from multiple tables, reducing DB queries.

Delay and batch‑consume change messages, then batch‑query local DB data and upstream/downstream interfaces.

Compare each field individually.

The writer initiates reconciliation because the writer’s call to the receiver’s interface may fail, and the receiver cannot trigger reconciliation when a write fails.

Delaying consumption (e.g., 15 seconds) avoids false alarms caused by short‑term inconsistencies such as replication lag or interface timeout retries.

Batching messages reduces query QPS, preventing high load on the service and upstream/downstream systems.

2. Offline Reconciliation (hours‑to‑days)

Why is offline reconciliation needed despite having online reconciliation?

It serves as a safety net, periodically processing historical data.

It does not affect the stability of online services.

External third‑party systems often only provide offline data (e.g., payment settlement files) and lack query interfaces.

Implementation steps:

Offline collection: Export data from each system to Hive tables, which may be MySQL tables or reconciliation files.

Data normalization: Parse all reconciliation fields and generate unified wide tables A and B.

Offline comparison:

Check whether record counts match.

Check whether data content matches: use left join to find rows only in the left table, right join for rows only in the right table, and inner join with field‑by‑field comparison to locate differences.

SQL examples:

-- (1) Compare row counts of two tables
select count(1) from table_a; -- left table count
select count(1) from table_b; -- right table count

-- (2) Compare data content
-- (2.1) Rows only in left table
select * from table_a
left outer join table_b
on table_a.biz_field = table_b.biz_field
where table_b.biz_field is null;

-- (2.2) Rows only in right table
select * from table_a
right outer join table_b
on table_a.biz_field = table_b.biz_field
where table_a.biz_field is null;

-- (2.3) Rows with differences
select * from table_a
inner join table_b
on table_a.biz_field = table_b.biz_field
where (table_a.field_1 <> table_b.field_1
or table_a.field_2 <> table_b.field_2
or table_a.field_3 <> table_b.field_3);

3. Automatic Repair

Typically, after the writer detects inconsistencies, it performs automatic repair following these steps:

Record the differences for later troubleshooting.

Execute automatic repair.

Re‑reconcile after repair to ensure final consistency.

Conclusion

Real‑time and offline reconciliation complement each other and are both essential. New requirements must ensure functional correctness and also implement reconciliation to avoid undiscovered or delayed detection of inconsistencies.

Automatic repair after detecting mismatches guarantees eventual system consistency.

real-timeSQLoperationsofflineData Reconciliationautomatic repair
Java Baker
Written by

Java Baker

Java architect and Raspberry Pi enthusiast, dedicated to writing high-quality technical articles; the same name is used across major platforms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.