How Meituan Optimized a Non‑Intrusive MySQL Packet Capture for Full‑SQL Auditing
This article explains Meituan's non‑intrusive, Go‑based packet‑capture solution for full‑SQL database auditing, details the performance challenges encountered at high QPS, and describes a series of systematic analyses and optimizations—including probe redesign, CPU profiling, data‑masking, scheduling, GC pressure reduction, and packet filtering—that dramatically lowered loss rates and CPU usage.
Background
Database security is a top priority for Meituan's information security and database teams, but historically only sampled audits were possible, making it hard to detect and mitigate attacks quickly. To identify characteristic malicious SQL statements, they needed full‑SQL capture of MySQL traffic.
Current Situation & Challenges
The existing MySQL audit system uses a sampling approach with pcap‑based data collection agents (rds‑agent) on each MySQL instance and log‑agents to forward data to Kafka, where Storm processes it in real time and Hive stores it for later analysis. High‑traffic core MySQL clusters generate up to ~50k QPS, and the rds‑agent, as a parasitic process, must remain lightweight. Stress tests with Sysbench revealed unacceptable data‑loss rates and CPU consumption at high QPS.
Analysis & Optimization
3.1 Data‑Collection Agent Overview
The rds‑agent, written in Go and based on the open‑source MysqlProbe, listens on the MySQL port, extracts timestamps, client IP, username, SQL, target DB, and target IP, and forwards audit data via Thrift to a log‑agent. Its architecture consists of five modules:
probe : uses gopacket (a Go wrapper for libpcap) to capture raw Ethernet frames, reassembles TCP packets, and hashes source/destination IP‑port to distribute connections across workers.
watcher : periodically runs SHOW PROCESSLIST to recover missing login usernames that are only present during the MySQL handshake.
worker : manages the lifecycle of multiple connections, detects expired connections, and releases resources to avoid memory leaks.
connStream : parses MySQL protocol packets, identifies target SQL statements, and performs on‑the‑fly data masking.
sender : packages the parsed audit records and sends them to the log‑agent via Thrift.
3.2 Baseline Performance Test
To verify whether the bottleneck lay in gopacket, a simple TCP client/server was built and three critical steps of the packet‑processing pipeline were benchmarked. Results showed that gopacket itself was not the limiting factor.
3.3 CPU Profiling
Using Go's built‑in pprof tool, a flame graph revealed that the majority of CPU time was spent in SQL masking, packet unpacking, garbage collection (GC), and Goroutine scheduling.
3.4 Masking Optimization
SQL masking originally parsed every statement with PingCAP's SQL parser to replace literals with “?”. Because only sampled or targeted SQL needed masking, the masking step was moved downstream to the sender, masking only the final samples. This reduced parsing overhead.
3.5 Scheduling Optimization
The original pipeline involved many Goroutines, causing frequent context switches. Optimizations included merging the splitter, worker, and SQL parser into a single Goroutine and throttling the packet‑queue poll interval to 5 ms, balancing CPU usage and loss rate.
3.6 Garbage‑Collection Pressure Reduction
Profiling showed >40 million objects allocated in 30 seconds, creating heavy GC pressure. Two mitigation strategies were evaluated:
Object pooling via sync.Pool to reuse frequently allocated structures.
Manual memory management using mmap to bypass Go's GC (rejected due to leak risk).
The pooling approach was adopted, significantly lowering GC overhead.
3.7 Unpacking Optimization
MySQL’s TCP‑based protocol generates many empty ACK packets, which the parser unnecessarily processes, increasing Goroutine scheduling and GC load. By applying pcap filter rules that drop empty packets based on packet length and sequence ID, the packet‑processing load was reduced.
# Feature 1: packet length >= 4 bytes
ip[2:2] - ((ip[0] & 0x0f) << 2) - ((tcp[12:1] & 0xf0) >> 2) >= 4
# Feature 2: filter empty ACKs on port 3306
(dst host {localIP} and dst port 3306 and (tcp[(((tcp[12:1] & 0xf0) >> 2) + 3)] <= 0x01))Final Results
After the series of optimizations, loss rates dropped from a peak of 60 % to 0 %, and CPU consumption fell from six cores to a single core. End‑to‑end Sysbench tests showed that the overall performance impact on MySQL TPS, QPS, and 99th‑percentile latency stayed under 6 %.
Future Plans
Despite the improvements, the packet‑capture approach still incurs noticeable latency for latency‑sensitive workloads and struggles with edge cases such as TCP loss/reorder, compressed MySQL traffic, or very large SQL statements. The industry trend is moving toward kernel‑level modifications that emit full‑SQL directly and provide richer metrics. Meituan’s database kernel team has already implemented such a solution and is gradually rolling it out to replace the packet‑capture method, while also working to fill the missing end‑to‑end loss‑rate metrics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
