High-Performance SQL Auditing in Cobar: Architecture, Buffers, and UDP
This article describes how to implement a low‑overhead SQL audit feature for the Cobar database middleware by modifying its code, using an agent with Kafka, choosing UDP for inter‑process communication, and designing a custom ring‑buffer to achieve near‑baseline throughput with only minimal performance loss.
Background
Cobar Overview
Cobar is an open‑source database middleware from Alibaba.
When business traffic grows rapidly, the database often becomes a bottleneck; a middleware layer can alleviate this.
A proxy‑type middleware (the article does not discuss client‑SDK middleware) typically provides transparent proxying, horizontal/vertical sharding, read/write separation, connection reuse, fault detection with fast failover, and high stability and performance.
Cobar already supports most of these features well, and adding read/write separation is straightforward.
SQL Auditing
The author previously implemented a custom SQL‑audit feature on Cobar.
From an operational perspective, collecting executed SQL statements is necessary; from a security perspective, auditing abnormal or leaked SQL is also required.
The audit needs to capture the SQL text, execution time, source host, and row count.
Even a simple audit requirement must be carefully designed and tested in a high‑concurrency, low‑latency environment where a single Cobar instance can handle tens of thousands of QPS.
For example, calling System.currentTimeMillis() directly in Java is cheap, but doing the same inside Cobar would cause severe performance degradation.
Technical Solution
Overall Direction
Two possible approaches were considered:
Modify Cobar source code and add instrumentation points where information is needed.
Use Alibaba Cloud Database's traffic‑capture solution.
The first, simpler approach was chosen.
The audit must not noticeably degrade Cobar performance and must never make Cobar unavailable.
Performance should be as close as possible to the version without auditing.
The solution must never cause Cobar to become unavailable.
Performance was measured with sysbench. On a 4C8G machine, the baseline was 55,000 queries per second (5.5w/s).
Because the implementation modifies Cobar code, an agent pattern was adopted: the instrumented code only records data and forwards it to an external agent, which then sends the audit information to Kafka. This keeps Cobar free of additional third‑party dependencies.
The diagram highlights two key technical challenges: thread communication and process communication.
Process Communication
Cobar is written in Java, so the options considered were TCP, UDP, Unix‑Domain Socket, and file I/O.
Unix‑Domain Socket was rejected due to platform dependence and lack of official support.
File I/O was rejected because of high I/O cost and risk of disk exhaustion under high concurrency.
Between TCP and UDP, UDP was chosen because it avoids TCP's packet‑sticking (粘包) problem and generally offers better performance for log‑like data.
Thread Communication
Thread‑level communication directly impacts Cobar's main execution thread, so a non‑blocking, bounded, thread‑safe, high‑performance buffer is required.
Bounded to avoid memory overflow.
Non‑blocking to prevent the main thread from stalling.
Unordered and tolerant of occasional data loss for availability.
Thread‑safe under high concurrency.
High throughput.
Java Built‑in Queues
Java provides bounded queues such as ArrayBlockingQueue and LinkedBlockingQueue, but they use locks and may not meet performance goals.
Inspired by ConcurrentHashMap and LongAdder, the author built a composite buffer using multiple ArrayBlockingQueue instances.
Testing showed a throughput of 47,000 QPS, about a 10% performance loss.
Disruptor
Disruptor is a lock‑free, bounded ring buffer that uses CAS for synchronization and is used by projects like Log4j2.
However, when the Disruptor buffer becomes full it blocks, which violates the non‑blocking requirement, so it was discarded.
SkyWalking RingBuffer
SkyWalking, an open‑source APM system, implements a simple ring buffer using an array and CAS to obtain write indices.
The buffer can be configured to block, overwrite, or drop data when full; the author chose the overwrite strategy.
Initial tests with multiple SkyWalking RingBuffers achieved only 30,000 QPS (45% performance loss).
Optimization was performed by replacing CAS with incrementAndGet, which on JDK 8 uses the CPU's fetch‑and‑add instruction, yielding a significant speedup.
Additional cache‑line padding optimizations (inspired by Disruptor) raised throughput to 54,000 QPS, only a 1.8% loss.
The optimized ring buffer was contributed back to the SkyWalking community and praised as an "interesting contribution".
Conclusion
The SQL audit feature has been running stably in production, supporting the highest‑QPS Cobar clusters.
While the extreme performance tuning may appear overly obsessive, it reflects a deep commitment to technical excellence.
Follow the author’s public account “捉虫大师” for more insights.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Xiao Lou's Tech Notes
Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
