Kafka End-to-End Auditing: Overview of Chaperone, Confluent Control Center, and Kafka Monitor
This article explains Kafka end‑to‑end auditing, compares three products (Chaperone, Confluent Control Center, Kafka Monitor), describes timestamp and index embedding techniques, and outlines their architectures, metrics, and implementation details for detecting data loss, duplication, and latency.
Kafka end‑to‑end auditing refers to tracking the number of messages and latency from the moment a producer writes a message to a broker until a consumer reads it, enabling detection of data loss, duplication, and latency.
The survey covers three main products:
Chaperone (Uber)
Confluent Control Center (commercial)
Kafka Monitor (LinkedIn)
Auditing is typically performed by embedding either a timestamp, a global index, or both into the message payload. The timestamp method assigns each message to a time bucket (e.g., timestamp - timestamp%time_bucket_interval or floor((timestamp/15)*15)) and counts messages per bucket. The index method assigns a unique sequential index to each message, allowing precise detection of missing or duplicate messages; latency calculation still requires a timestamp.
Chaperone
Chaperone uses an embedded timestamp to bucket messages, counts them, and stores audit results (auditMessage) containing fields such as topicName, time_bucket_start, metrics_count, various latency metrics, tier, hostname, datacenter, and uuid. Latency is computed as currentTimeMillis - (timestamp*1000).
The architecture consists of AuditLibrary (ChaperoneClient), ChaperoneService, ChaperoneCollector, and a WebService. The client library can be embedded in producer or consumer code, sending audit messages to a dedicated Kafka topic. ChaperoneService uses a write‑ahead log (WAL) with UUIDs to guarantee exactly‑once audit processing, and ChaperoneCollector persists audit data to MySQL (or optionally Redis).
Confluent Control Center
This commercial product also relies on timestamps embedded in the payload and uses the same bucket algorithm ( floor((timestamp/15)*15)). It embeds an audit library in producers/consumers, stores audit messages in Kafka, and a web UI consumes those messages directly for display.
Kafka Monitor
Kafka Monitor embeds both an index and a timestamp in the payload, enabling detection of lost, duplicated messages and end‑to‑end latency. It provides a web UI and exposes a set of metrics such as produce/consume availability, total records produced/consumed, records lost, duplicated, and various latency percentiles.
name
description
produce-avaliablility-avg
The average produce availability
consume-avaliability-avg
The average consume availability
records-produced-total
The total number of records that are produced
records-consumed-total
The total number of records that are consumed
records-lost-total
The total number of records that are lost
records-duplicated-total
The total number of records that are duplicated
records-delay-ms-avg
The average latency of records from producer to consumer
records-produced-rate
The average number of records per second that are produced
produce-error-rate
The average number of errors per second
consume-error-rate
The average number of errors per second
records-delay-ms-99th
The 99th percentile latency of records from producer to consumer
records-delay-ms-999th
The 999th percentile latency of records from producer to consumer
records-delay-ms-max
The maximum latency of records from producer to consumer
All three tools aim to provide reliable, once‑only audit processing, using techniques such as WAL, UUID deduplication, and consistent timestamps across layers to ensure accurate metrics and detection of anomalies in Kafka pipelines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
