Fixing Event Merge Bugs in Enhanced Singleflight: A Practical Compensation Approach
This article explores a bug in an enhanced singleflight event‑merging system, illustrates problematic scenarios with overlapping events, and presents a time‑stamp based compensation mechanism that detects and re‑pushes inconsistent data to ensure correct final event states.
Ideal State
To simplify, events are represented by letters: A for an event occurrence, A' for the start of execution, and D'' for the state after execution finishes.
In the ideal scenario, event A triggers, and before it finishes, events B, C, and D arrive and are held. After A completes, B, C, and D simultaneously enter a singleflight group and compete, resulting in D'', which is considered perfect.
Case 1
A reader asked what happens if, while B, C, and D are executing, a new event E arrives. Event E would retrace A's path; if E finishes faster than B, C, D, the final result becomes D'' instead of the expected E'', which is incorrect.
Case 2
If, during E's execution, events F and G accumulate and finish before B, C, D, the expected result is G'', but the actual outcome remains D''.
Is There a Problem in Production?
These scenarios are hard to test, and encountering them poses risk. Our system includes a protection mechanism that periodically compensates inconsistent push data.
Before pushing, for the same key we generate (or update) a record containing two timestamps, t1 and t2. The push start time tn (nanosecond precision) is stored in t1; after the push completes, the same tn is stored in t2. The pseudocode is:
tn := time.Now().UnixNano()
markT1(key, tn)
push(key)
markT2(key, tn)If t1 == t2, the push succeeded; otherwise, the push requires compensation. Every 10 seconds we scan for such events and re‑push the latest data.
Compensation Example (Case 1)
A completes: t1 = ta, t2 = ta D starts: t1 = td E starts and finishes: t1 = te, t2 = te D finishes: t1 = te, t2 = td After 10 s, t1 != t2 triggers re‑push, delivering the correct E'' result.
Conclusion
Our online system’s protection layer prevents issues, but a singleflight‑level solution is still open. Readers are invited to share better approaches.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Xiao Lou's Tech Notes
Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
