How to Build a Real‑Time MySQL Statistics Platform with ClickHouse
This article explains how a growing company designed, optimized, and deployed a comprehensive MySQL monitoring and analysis pipeline—moving from Flume‑HDFS‑Hive to ClickTail‑ClickHouse, enriching SQL parsing, and applying practical methods for state statistics, trend analysis, permission management, and data‑skew detection.
Observing Business Issues
DBAs often wonder which SQL statements drive QPS spikes, which tables or fields are heavily accessed, and how user permissions are distributed; answering these questions requires more than a simple monitoring chart.
Data Collection, Parsing and Storage
The initial solution collected raw MySQL audit logs, parsed them with a Flume interceptor, stored the results in HDFS, and queried via Hive. Later a ClickHouse sink was added for faster queries. The pipeline was eventually replaced by clicktail (https://github.com/Altinity/clicktail) which writes directly to ClickHouse, adds lightweight SQL parsing, normalizes statements, computes a checksum, and stores the first 300 characters of each SQL.
Example SQL statements used to illustrate parsing limitations:
select id, name, qq from users where id = 1 and status = 1; select id, name, qq from users where id in (1000 ids) and status = 1; select id, name, qq from users where id = 1 or status = 1;Basic Methods for Observing Business
1. State Statistics
Instance‑level statistics show every query, its SQL text, accessed tables, fields, user, and IP. Table‑level and field‑level breakdowns reveal hot tables and columns. User‑level views expose which accounts access which objects.
2. Business Change Analysis
Comparing statistics from two time windows (day or week) highlights QPS spikes, new SQL releases, or periodic patterns.
3. Trend
Aggregating state over consecutive intervals produces trend curves for overall QPS, individual SQL, specific tables, or users, often with second‑level granularity.
4. Variable Distribution
For SQL statements under 300 characters, key variables are extracted and their frequency distribution and quantiles are analyzed, optionally combined with trend data.
5. User Behavior and Permission Observation
Per‑user reports list accessed tables, IP addresses, and request volumes; per‑table reports list active users.
Practice
1. SQL上线/下线 Observation
By setting the comparison window to daily or weekly, newly deployed SQL can be detected and investigated promptly.
2. Hot Data / Data Skew Analysis
High‑QPS SQL are examined for cache suitability; slow‑log analysis reveals tables with data skew that degrade performance.
3. Correlation Analysis of SQL Trend Data
Overall QPS trend vs. individual SQL trends (correlation coefficient).
Group‑wise SQL correlation to discover related query batches.
Cross‑correlation with Nginx access logs to map SQL to responsible services.
4. Permission Reclamation and Package‑Based Granting
Long‑term audit logs enable safe revocation of unused permissions and the creation of permission “packages” (user, IP list, table list) that simplify future changes.
5. Table/Field Decommission Decision
Tables or fields with no read/write activity over a significant period are candidates for decommission; fields are judged primarily by write activity to avoid false positives from SELECT *.
Conclusion and Outlook
Storing parsed audit logs in ClickHouse provides durable, query‑fast access to comprehensive MySQL usage data, making database behavior transparent to DBAs and enabling systematic, multi‑dimensional business modeling and optimization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
