Fundamentals 17 min read

Jepsen Uncovered: A Practical Guide to Linearizability Testing

This article explains the fundamentals of Jepsen testing, compares it with TLA+, describes its architecture and workflow, illustrates how to apply Jepsen for linearizability verification of distributed systems such as locks, and offers practical guidance on integrating Jepsen or building custom testing frameworks.

Alibaba Cloud Developer

Oct 26, 2021

Jepsen Uncovered: A Practical Guide to Linearizability Testing

Essence – What Makes Jepsen Unique

In distributed‑system testing, Jepsen is the black‑box counterpart to the white‑box tool TLA+. While TLA+ requires a formal model of the system, Jepsen builds the actual system, injects faults, records operations, and checks whether the observed history violates the expected consistency rules, especially linearizability.

Structure – How Jepsen Works

Jepsen consists of three core modules:

DB : the system under test.

Generator : creates a sequence of client operations and error injections.

Checker : consumes the generated History and validates it against a chosen consistency model.

These modules can be used together or independently, allowing users to generate History with custom tools or to reuse Jepsen’s Checker with external History data.

Applying Jepsen – The Nvwa Example

The Nvwa team used Jepsen to verify a distributed lock implementation. They wrapped the lock’s RESTful API (create‑lease, touch‑lease, keep‑lease‑alive, create‑lock, delete‑lock, etc.) into a Jepsen Client, generated History records for each operation, and then used the built‑in mutex model to check linearizability.

For lock mutual‑exclusion, the default mutex model suffices. To test lock availability after lease expiration, they extended the model and Checker to simulate a client that attempts to acquire the lock only after the lease TTL has passed, ensuring the lock cannot be stolen before the TTL and can be reacquired afterward.

(defrecord Mutex [locked?]
  Model
  (step [r op]
    (condp = (:f op)
      :acquire (if locked?
                 (inconsistent "already held")
                 (Mutex. true))
      :release (if locked?
                 (Mutex. false)
                 (inconsistent "not held"))))
  Object
  (toString [this] (if locked? "locked" "free")))

(defn mutex [] (Mutex. false))

The code above shows the minimal Clojure model used by Jepsen to represent a single mutex.

Choosing Between Jepsen and a Custom Framework

When deciding whether to adopt Jepsen directly or build a bespoke testing framework, consider:

Language : Jepsen is written in Clojure, which may have a learning curve; custom frameworks can use mainstream languages like Go, Java, or Python.

Framework functionality : Jepsen offers mature models and checkers out‑of‑the‑box; a custom framework requires implementing needed features yourself.

Community adoption : Jepsen is widely used and trusted for consistency testing; a home‑grown solution needs internal promotion.

History conversion for lock availability

Conclusion

Jepsen provides a relatively easy‑to‑apply black‑box testing approach for verifying linearizability and other consistency properties of distributed systems. By abstracting a system’s state machine into a Jepsen Model and feeding it a History of operations, developers can quickly detect subtle bugs that are hard to catch with white‑box methods like TLA+. For scenarios requiring custom checks or different consistency guarantees, Jepsen’s modular design allows selective reuse of its components or integration with bespoke testing pipelines.

distributed-systems Testing Consistency Linearizability model checking Jepsen

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.