Big Data 11 min read

Kafka End-to-End Auditing: Overview of Chaperone, Confluent Control Center, and Kafka Monitor

This article explains Kafka end‑to‑end auditing, compares three products (Chaperone, Confluent Control Center, Kafka Monitor), describes timestamp and index embedding techniques, and outlines their architectures, metrics, and implementation details for detecting data loss, duplication, and latency.

Architecture Digest

Oct 1, 2017

Kafka End-to-End Auditing: Overview of Chaperone, Confluent Control Center, and Kafka Monitor

Kafka end‑to‑end auditing refers to tracking the number of messages and latency from the moment a producer writes a message to a broker until a consumer reads it, enabling detection of data loss, duplication, and latency.

The survey covers three main products:

Chaperone (Uber)

Confluent Control Center (commercial)

Kafka Monitor (LinkedIn)

Auditing is typically performed by embedding either a timestamp, a global index, or both into the message payload. The timestamp method assigns each message to a time bucket (e.g., timestamp - timestamp%time_bucket_interval or floor((timestamp/15)*15)) and counts messages per bucket. The index method assigns a unique sequential index to each message, allowing precise detection of missing or duplicate messages; latency calculation still requires a timestamp.

Chaperone

Chaperone uses an embedded timestamp to bucket messages, counts them, and stores audit results (auditMessage) containing fields such as topicName, time_bucket_start, metrics_count, various latency metrics, tier, hostname, datacenter, and uuid. Latency is computed as currentTimeMillis - (timestamp*1000).

The architecture consists of AuditLibrary (ChaperoneClient), ChaperoneService, ChaperoneCollector, and a WebService. The client library can be embedded in producer or consumer code, sending audit messages to a dedicated Kafka topic. ChaperoneService uses a write‑ahead log (WAL) with UUIDs to guarantee exactly‑once audit processing, and ChaperoneCollector persists audit data to MySQL (or optionally Redis).

Confluent Control Center

This commercial product also relies on timestamps embedded in the payload and uses the same bucket algorithm ( floor((timestamp/15)*15)). It embeds an audit library in producers/consumers, stores audit messages in Kafka, and a web UI consumes those messages directly for display.

Kafka Monitor

Kafka Monitor embeds both an index and a timestamp in the payload, enabling detection of lost, duplicated messages and end‑to‑end latency. It provides a web UI and exposes a set of metrics such as produce/consume availability, total records produced/consumed, records lost, duplicated, and various latency percentiles.

name

description

produce-avaliablility-avg

The average produce availability

consume-avaliability-avg

The average consume availability

records-produced-total

The total number of records that are produced

records-consumed-total

The total number of records that are consumed

records-lost-total

The total number of records that are lost

records-duplicated-total

The total number of records that are duplicated

records-delay-ms-avg

The average latency of records from producer to consumer

records-produced-rate

The average number of records per second that are produced

produce-error-rate

The average number of errors per second

consume-error-rate

The average number of errors per second

records-delay-ms-99th

The 99th percentile latency of records from producer to consumer

records-delay-ms-999th

The 999th percentile latency of records from producer to consumer

records-delay-ms-max

The maximum latency of records from producer to consumer

All three tools aim to provide reliable, once‑only audit processing, using techniques such as WAL, UUID deduplication, and consistent timestamps across layers to ensure accurate metrics and detection of anomalies in Kafka pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring metrics distributed-systems big-data audit

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.