Databases 16 min read

How to Build a Real‑Time MySQL Statistics Platform with ClickHouse

This article explains how a growing company designed, optimized, and deployed a comprehensive MySQL monitoring and analysis pipeline—moving from Flume‑HDFS‑Hive to ClickTail‑ClickHouse, enriching SQL parsing, and applying practical methods for state statistics, trend analysis, permission management, and data‑skew detection.

dbaplus Community
dbaplus Community
dbaplus Community
How to Build a Real‑Time MySQL Statistics Platform with ClickHouse

Observing Business Issues

DBAs often wonder which SQL statements drive QPS spikes, which tables or fields are heavily accessed, and how user permissions are distributed; answering these questions requires more than a simple monitoring chart.

Data Collection, Parsing and Storage

The initial solution collected raw MySQL audit logs, parsed them with a Flume interceptor, stored the results in HDFS, and queried via Hive. Later a ClickHouse sink was added for faster queries. The pipeline was eventually replaced by clicktail (https://github.com/Altinity/clicktail) which writes directly to ClickHouse, adds lightweight SQL parsing, normalizes statements, computes a checksum, and stores the first 300 characters of each SQL.

Example SQL statements used to illustrate parsing limitations:

select id, name, qq from users where id = 1 and status = 1;
select id, name, qq from users where id in (1000 ids) and status = 1;
select id, name, qq from users where id = 1 or status = 1;

Basic Methods for Observing Business

1. State Statistics

Instance‑level statistics show every query, its SQL text, accessed tables, fields, user, and IP. Table‑level and field‑level breakdowns reveal hot tables and columns. User‑level views expose which accounts access which objects.

2. Business Change Analysis

Comparing statistics from two time windows (day or week) highlights QPS spikes, new SQL releases, or periodic patterns.

3. Trend

Aggregating state over consecutive intervals produces trend curves for overall QPS, individual SQL, specific tables, or users, often with second‑level granularity.

4. Variable Distribution

For SQL statements under 300 characters, key variables are extracted and their frequency distribution and quantiles are analyzed, optionally combined with trend data.

5. User Behavior and Permission Observation

Per‑user reports list accessed tables, IP addresses, and request volumes; per‑table reports list active users.

Practice

1. SQL上线/下线 Observation

By setting the comparison window to daily or weekly, newly deployed SQL can be detected and investigated promptly.

2. Hot Data / Data Skew Analysis

High‑QPS SQL are examined for cache suitability; slow‑log analysis reveals tables with data skew that degrade performance.

3. Correlation Analysis of SQL Trend Data

Overall QPS trend vs. individual SQL trends (correlation coefficient).

Group‑wise SQL correlation to discover related query batches.

Cross‑correlation with Nginx access logs to map SQL to responsible services.

4. Permission Reclamation and Package‑Based Granting

Long‑term audit logs enable safe revocation of unused permissions and the creation of permission “packages” (user, IP list, table list) that simplify future changes.

5. Table/Field Decommission Decision

Tables or fields with no read/write activity over a significant period are candidates for decommission; fields are judged primarily by write activity to avoid false positives from SELECT *.

Conclusion and Outlook

Storing parsed audit logs in ClickHouse provides durable, query‑fast access to comprehensive MySQL usage data, making database behavior transparent to DBAs and enabling systematic, multi‑dimensional business modeling and optimization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data pipelinemysqlDBASQL AnalyticsDatabase Monitoring
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.