How to Solve Data Races with Lightweight Distributed Locks in High‑Concurrency Systems
This article explains how to avoid data races in high‑traffic distributed systems by comparing pessimistic and optimistic concurrency control, demonstrating RDBMS row locks and a Redis‑based distributed lock implementation with Python code examples.
Background
To prevent data race problems, locking is commonly used to restrict resource access. In single‑process environments, language‑provided concurrency APIs suffice, but large websites handling high traffic adopt distributed and clustered deployments, requiring distributed locks for concurrency control.
The following example, based on the author’s experience at Baixing.com, shows a lightweight distributed‑lock solution for high‑concurrency data races.
TL;DR
Data races can be addressed with two concurrency‑control strategies:
Pessimistic concurrency control : assumes data will be modified concurrently, so a program instance acquires a lock before operating on the data and releases it after committing.
Optimistic concurrency control : assumes data is unlikely to be modified concurrently, performs operations directly, and checks for conflicts at commit time, rolling back if necessary.
If the operation involves external service calls, pessimistic control may cause unpredictable blocking, while optimistic control avoids blocking but requires rollback logic, which can also waste performance.
Thus, besides minimizing lock granularity, choosing an appropriate concurrency‑control strategy is crucial.
Lock implementations include simple RDBMS transaction locks (with limited granularity) and Redis‑based distributed locks (more flexible but requiring handling of expiration and duplicate releases).
A Simple Example
Data race occurs when multiple program instances concurrently access and modify the same data without exclusive locks.
Concurrent access to the same data location
Modification of the data
No exclusive lock or other access‑control mechanism
Consider an order‑purchasing service where users submit purchase requests, the backend stores orders in a database with status
PENDING_CREATE, and a periodic worker scans pending orders and sends purchase requests to a supplier.
Database schema (simplified):
At time t0, two orders (id 1 and 2) have status
PENDING_CREATEand are sent to two workers via
purchase(). If the queue backs up or the external service is unstable, the same orders may be sent again after one minute, leading to duplicate purchases.
Two possible solutions:
Ensure each task is sent exactly once.
Make the task idempotent (multiple executions have no side effects).
Because guaranteeing strong consistency across distributed systems is often impractical, the article focuses on the second solution, using database native locks as a starting point.
Solution 1: Use RDBMS Locks
Modify
purchase()to wrap the operation in a transaction and acquire a row‑level lock with
FOR UPDATE:
<code>def purchase(order_id):
db.execute('BEGIN')
order = db.execute('SELECT * FROM tb_order WHERE order_id = {} FOR UPDATE'.format(order_id))
if order.status == PENDING_CREATE:
try:
foid = foreign_service.purchase(order.product, order.spec)
except ForeignServiceException:
db.execute('ABORT')
else:
db.execute('UPDATE tb_order SET status = FINISHED and foreign_order_id = {} WHERE id = {}'.format(foid, order_id))
db.execute('COMMIT')
</code>The
FOR UPDATEclause acquires an exclusive lock on the selected row, preventing other concurrent transactions from updating it – a classic pessimistic concurrency control.
Pessimistic Concurrency Control prevents a transaction from modifying data that other users might be accessing simultaneously; the lock is held until the transaction releases it.
While safe, this approach adds overhead, can cause deadlocks, and reduces parallelism. Moreover, if the external
foreign_service.purchase()call blocks for a long time, the database’s concurrency is severely impacted.
Therefore, the article proposes using optimistic concurrency control for the external call, assuming no conflict and checking for changes only at commit time.
Optimistic Concurrency Control assumes transactions do not interfere; at commit, each transaction verifies that the data read has not been modified by others, rolling back if a conflict is detected.
Optimistic control avoids deadlocks but may cause many rollbacks under high write frequency, and requires explicit undo logic for external operations.
<code>def cancel_purchase(foreign_order_id):
try:
foreign_service.cancel_purchase(foreign_order_id)
except ForeignServiceException:
# retry somehow
pass
</code>Persisting undo information is necessary for eventual consistency but adds system complexity.
Solution 2: Redis‑Based Distributed Lock
The desired lock must prevent multiple workers from processing the same order while not blocking user updates. This calls for a hybrid approach: use a pessimistic lock for the worker side and an optimistic lock for user updates.
A distributed lock ensures that different program instances can exclusively operate on a shared resource.
Redis provides atomic
SETNXand expiration features, making it suitable for implementing such locks.
A simple lock function:
<code>def lock(key):
if atomic_test_and_set(key) == 1:
return 1
else:
return 0
</code>Using the
lua‑lockimplementation from
redis‑py, a context manager can acquire and release the lock safely:
<code>@contextmanager
def redis_lock(key, timeout=15):
_lock = LuaLock(REDIS, key, timeout)
try:
_lock.acquire(blocking=True)
yield _lock
finally:
_lock.release()
def purchase(order_id):
with redis_lock('tb_order:{}:status'.format(order_id)):
order = db.execute('SELECT * FROM tb_order WHERE order_id = {}'.format(order_id))
if order.status == PENDING_CREATE:
try:
foreign_service.purchase(order.product, order.spec)
except ForeignServiceException:
# retry or give up
pass
else:
db.execute('UPDATE tb_order SET status = FINISHED WHERE id = {}'.format(order_id))
</code>This guarantees that only one instance holds the exclusive lock for a given order, while other instances block until the lock is released. After acquiring the lock, the worker checks the order status again to ensure it has not been modified by a user, achieving an optimistic‑check‑before‑commit pattern.
Lock keys can be constructed by concatenating table name, id, and column, e.g.,
tb_order:1:status.
Even with this approach, edge cases remain: if a user updates the order while the worker holds the lock, the worker can detect the change and either mark the order as
PENDING_UPDATEand trigger an asynchronous update, or simply skip processing.
Overall workflow (illustrated in the original diagram) shows the lock acquisition, external service call, status update, and lock release.
Conclusion
The article presented pessimistic and optimistic concurrency controls and demonstrated how Redis can provide finer‑grained locks than native RDBMS locks to improve concurrency performance. Neither strategy is universally superior; the choice depends on write frequency and performance requirements, so developers should select the appropriate method based on their specific scenario.
Baixing.com Technical Team
A collection of the Baixing.com tech team's insights and learnings, featuring one weekly technical article worth following.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.