Databases 23 min read

Custom MySQL Data Consistency Check Tool: Design, Implementation, and Usage

This article introduces a custom MySQL data consistency verification tool inspired by pt-table-checksum, explains the challenges of handling complex primary keys, details the implementation logic with shell scripts and SQL, and provides installation, configuration, and usage instructions including parallelism, network monitoring, and scheduling.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Custom MySQL Data Consistency Check Tool: Design, Implementation, and Usage

Requirement Introduction

The business has two MySQL clusters synchronized via MQ; after an MQ failure causing many primary‑key conflicts, a DBA is asked to verify data consistency between the clusters.

Background

The author initially thought pt‑table‑checksum could be used via DSN, but discovered limitations when dealing with non‑master‑slave environments.

The article follows the pt‑table‑checksum logic: iterate tables in data blocks, compare checksum values, and discuss problems encountered while implementing a custom script.

Testing DSN

A test environment is built using MySQL 5.7.26 and Percona‑toolkit‑3.2.1.

1. Verify master‑slave data consistency

Connect to the slave via DSN and run:

$ ./bin/pt-table-checksum h='192.168.1.1',u='mydba',p='test123456',P=6666 --nocheck-replication-filters --replicate=test.checksums --no-check-binlog-format -d dbatest1  --recursion-method dsn=t=percona_schema.dsns

The test passes, confirming basic checksum functionality.

2. Verify non‑master‑slave data consistency

Stop the replica and run the same command; pt‑table‑checksum reports the replica is stopped, indicating it does not support non‑master‑slave checks.

Development Issues

1. Handling complex composite primary keys

pt‑table‑checksum relies on a non‑null unique key; handling composite keys, especially character‑type keys, requires special logic.

Three‑step algorithm:

Determine _min_rowid (first row primary key).

Query the next block using _min_rowid as the lower bound, record the block’s checksum and the new _min_rowid .

Repeat until _min_rowid is empty.

Additional considerations:

Detect whether the primary key is composite.

Detect data type (integer vs. string) to decide if quotes are needed in the WHERE clause.

Implementation extracts primary‑key column names, their data types, builds appropriate WHERE conditions, and formats _min_rowid accordingly (e.g., 1,'1' for int+char composite keys).

(2) Defining the left‑boundary of each data block for multi‑column keys

For a composite key (a,b,c) the correct SQL is:

SELECT * FROM table WHERE ((a > xxx) OR (a = xxx AND b > yyy) OR (a = xxx AND b = yyy AND c > zzz)) ORDER BY a,b,c LIMIT checksize

Tool Enhancements

1. Remove FOR UPDATE

Eliminate locking to minimize impact on production; use multiple checksum attempts (default 20) to mitigate transient inconsistencies.

2. Add table‑structure verification

Unlike pt‑table‑checksum, the custom tool also compares table schemas.

3. Support parallel table checks

User can specify parallelism; the tool caps it at (idle CPU cores ‑ 1) for safety.

4. Network monitoring

Pause checks when network usage exceeds a user‑defined threshold, useful for middleware or TiDB scenarios.

5. Scheduled execution

Allow specifying time windows (e.g., 0‑5 am) to run checks during low‑traffic periods; unfinished checks resume the next day.

6. Validate any two MySQL‑compatible nodes

Not limited to master‑slave; any nodes supporting standard MySQL SQL can be compared.

7. Timeout and self‑kill mechanisms

If a block query exceeds 5 seconds, it is aborted and retried; a kill process is issued based on a stored MD5 identifier to avoid orphaned queries.

Usage Guide

The tool can be cloned, configured, and run with various command‑line options:

git clone https://gitee.com/mo-shan/check_data_for_mysql.git
cd check_data_for_mysql
vim conf/check.conf   # edit according to environment

Start the network monitor:

bash manager.sh -a start -t eth0 -n 30

Run the consistency check (example):

bash start.sh -f true -d dbatest -t test1 -l 0-5

Log files (info.log, skip.log) record detailed results, including timestamps, status, affected tables, block ranges, execution time, and consistency outcome.

Result directories:

list : tables already checked and currently checking.

md5 : temporary checksum files.

pri : last primary‑key values per table.

res : subfolders table (SQL of mismatched blocks), row (secondary row checks), diff (final row differences).

Example of a mismatched block SQL and diff output are provided, illustrating how the tool captures and presents inconsistencies.

Conclusion

The tool extends pt‑table‑checksum with table‑structure checks, parallelism, network throttling, and scheduling, but is intended for learning and should be thoroughly tested before production use.

Automationdata consistencyMySQLshell scriptchecksumpt-table-checksumDatabase Tools
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.