Databases 12 min read

How Meituan Optimized a Non‑Intrusive MySQL Packet Capture for Full‑SQL Auditing

This article explains Meituan's non‑intrusive, Go‑based packet‑capture solution for full‑SQL database auditing, details the performance challenges encountered at high QPS, and describes a series of systematic analyses and optimizations—including probe redesign, CPU profiling, data‑masking, scheduling, GC pressure reduction, and packet filtering—that dramatically lowered loss rates and CPU usage.

ITPUB
ITPUB
ITPUB
How Meituan Optimized a Non‑Intrusive MySQL Packet Capture for Full‑SQL Auditing

Background

Database security is a top priority for Meituan's information security and database teams, but historically only sampled audits were possible, making it hard to detect and mitigate attacks quickly. To identify characteristic malicious SQL statements, they needed full‑SQL capture of MySQL traffic.

Current Situation & Challenges

The existing MySQL audit system uses a sampling approach with pcap‑based data collection agents (rds‑agent) on each MySQL instance and log‑agents to forward data to Kafka, where Storm processes it in real time and Hive stores it for later analysis. High‑traffic core MySQL clusters generate up to ~50k QPS, and the rds‑agent, as a parasitic process, must remain lightweight. Stress tests with Sysbench revealed unacceptable data‑loss rates and CPU consumption at high QPS.

Analysis & Optimization

3.1 Data‑Collection Agent Overview

The rds‑agent, written in Go and based on the open‑source MysqlProbe, listens on the MySQL port, extracts timestamps, client IP, username, SQL, target DB, and target IP, and forwards audit data via Thrift to a log‑agent. Its architecture consists of five modules:

probe : uses gopacket (a Go wrapper for libpcap) to capture raw Ethernet frames, reassembles TCP packets, and hashes source/destination IP‑port to distribute connections across workers.

watcher : periodically runs SHOW PROCESSLIST to recover missing login usernames that are only present during the MySQL handshake.

worker : manages the lifecycle of multiple connections, detects expired connections, and releases resources to avoid memory leaks.

connStream : parses MySQL protocol packets, identifies target SQL statements, and performs on‑the‑fly data masking.

sender : packages the parsed audit records and sends them to the log‑agent via Thrift.

3.2 Baseline Performance Test

To verify whether the bottleneck lay in gopacket, a simple TCP client/server was built and three critical steps of the packet‑processing pipeline were benchmarked. Results showed that gopacket itself was not the limiting factor.

3.3 CPU Profiling

Using Go's built‑in pprof tool, a flame graph revealed that the majority of CPU time was spent in SQL masking, packet unpacking, garbage collection (GC), and Goroutine scheduling.

3.4 Masking Optimization

SQL masking originally parsed every statement with PingCAP's SQL parser to replace literals with “?”. Because only sampled or targeted SQL needed masking, the masking step was moved downstream to the sender, masking only the final samples. This reduced parsing overhead.

3.5 Scheduling Optimization

The original pipeline involved many Goroutines, causing frequent context switches. Optimizations included merging the splitter, worker, and SQL parser into a single Goroutine and throttling the packet‑queue poll interval to 5 ms, balancing CPU usage and loss rate.

3.6 Garbage‑Collection Pressure Reduction

Profiling showed >40 million objects allocated in 30 seconds, creating heavy GC pressure. Two mitigation strategies were evaluated:

Object pooling via sync.Pool to reuse frequently allocated structures.

Manual memory management using mmap to bypass Go's GC (rejected due to leak risk).

The pooling approach was adopted, significantly lowering GC overhead.

3.7 Unpacking Optimization

MySQL’s TCP‑based protocol generates many empty ACK packets, which the parser unnecessarily processes, increasing Goroutine scheduling and GC load. By applying pcap filter rules that drop empty packets based on packet length and sequence ID, the packet‑processing load was reduced.

# Feature 1: packet length >= 4 bytes
ip[2:2] - ((ip[0] & 0x0f) << 2) - ((tcp[12:1] & 0xf0) >> 2) >= 4
# Feature 2: filter empty ACKs on port 3306
(dst host {localIP} and dst port 3306 and (tcp[(((tcp[12:1] & 0xf0) >> 2) + 3)] <= 0x01))

Final Results

After the series of optimizations, loss rates dropped from a peak of 60 % to 0 %, and CPU consumption fell from six cores to a single core. End‑to‑end Sysbench tests showed that the overall performance impact on MySQL TPS, QPS, and 99th‑percentile latency stayed under 6 %.

Future Plans

Despite the improvements, the packet‑capture approach still incurs noticeable latency for latency‑sensitive workloads and struggles with edge cases such as TCP loss/reorder, compressed MySQL traffic, or very large SQL statements. The industry trend is moving toward kernel‑level modifications that emit full‑SQL directly and provide richer metrics. Meituan’s database kernel team has already implemented such a solution and is gradually rolling it out to replace the packet‑capture method, while also working to fill the missing end‑to‑end loss‑rate metrics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationGomysqlPacket CaptureDatabase AuditingRDS-Agent
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.