Master MySQL Data Archiving with pt-archiver: Install, Run, and Optimize
This guide explains how to use Percona Toolkit's pt-archiver for MySQL data archiving, covering installation, basic and bulk commands, underlying two‑phase‑commit logic, performance trade‑offs of various parameters, and best‑practice tips for safe, efficient archiving and deletion.
What is pt-archiver?
pt-archiver is a Percona Toolkit utility that copies rows from a source MySQL instance to a destination (archive) instance and optionally deletes the source rows.
Installation
Download the latest Percona Toolkit binary (e.g., version 3.3.1) from https://www.percona.com/downloads/percona-toolkit/LATEST/ and install the required Perl modules:
# cd /usr/local/
wget https://downloads.percona.com/downloads/percona-toolkit/3.3.1/binary/tarball/percona-toolkit-3.3.1_x86_64.tar.gz --no-check-certificate
tar xvf percona-toolkit-3.3.1_x86_64.tar.gz
cd percona-toolkit-3.3.1
yum install perl-ExtUtils-MakeMaker perl-DBD-MySQL perl-Digest-MD5
perl Makefile.PL
make
make installSimple archiving example
Archive the employees.departments table from host 192.168.244.10 to host 192.168.244.128:
pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
--dest h=192.168.244.128,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
--where "1=1"Key options:
--source : DSN of the source instance.
--dest : DSN of the destination instance.
--where : Row‑selection condition; "1=1" selects the whole table.
Implementation principle
pt-archiver uses a two‑phase‑commit pattern per row:
Select one row from the source (using the primary‑key index).
Insert the row into the destination.
Delete the row from the source only after the insert succeeds.
This guarantees no data loss. Example General Log excerpt:
SELECT `dept_no`,`dept_name` FROM `employees`.`departments` WHERE (1=1) ORDER BY `dept_no` LIMIT 1;
INSERT INTO `employees`.`departments`(`dept_no`,`dept_name`) VALUES ('d001','Marketing');
DELETE FROM `employees`.`departments` WHERE (`dept_no`='d001');
COMMIT;Bulk archiving
For large tables, bulk options reduce the number of commits:
pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
--dest h=192.168.244.128,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
--where "1=1" \
--bulk-delete --limit 1000 --commit-each --bulk-insertImportant bulk parameters:
--bulk-delete : Delete rows in batches.
--limit N : Process N rows per batch.
--commit-each : One COMMIT per batch.
--bulk-insert : Load data into the destination with LOAD DATA INFILE for higher throughput.
Performance comparison
A benchmark on 200 k rows shows the fastest configuration is the full bulk mode ( --bulk-delete --limit 1000 --commit-each --bulk-insert). The default single‑row mode is the slowest. Adding --bulk-delete alone speeds up deletes but does not reduce commit overhead unless combined with --commit-each or --bulk-insert.
Common use cases
Safe bulk delete
Delete rows without archiving:
pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
--where "1=1" --bulk-delete --limit 1000 --commit-each --purge --primary-key-only --purgedisables insertion; --primary-key-only limits the SELECT to the primary key.
Archiving to files
Export rows as CSV files for later loading:
pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
--where "1=1" --bulk-delete --limit 1000 --file '/tmp/%Y-%m-%d-%D.%t'Place‑holders: %Y year, %m month, %D database name, %t table name.
Avoiding replication lag
Pause archiving when replica lag exceeds a threshold (default 1 s) using --check-slave-lag:
pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
--where "1=1" --bulk-delete --limit 1000 --commit-each --primary-key-only --purge \
--check-slave-lag h=192.168.244.20,P=3306,u=pt_user,p=pt_passSpecify the option multiple times to monitor several replicas.
Frequently used parameters
--analyze : Run ANALYZE TABLE on source (s), destination (d), or both after archiving (e.g., --analyze ds).
--optimize : Run OPTIMIZE TABLE similarly.
--charset : Set client connection charset (default latin1 before MySQL 8.0, utf8mb4 in 8.0).
--[no]check-charset : Verify that connection charset matches table charset.
--[no]check-columns : Ensure source and destination tables have matching column names.
--columns : Explicitly list columns to archive.
--dry-run : Show the SQL that would be executed without running it.
--ignore : Use INSERT IGNORE to skip duplicate‑key errors.
--no-delete : Archive without deleting source rows.
--replace : Use REPLACE instead of INSERT.
--[no]safe-auto-increment : By default, pt‑archiver protects the highest auto‑increment value; disable with --no-safe-auto-increment if you need to delete it.
--progress N : Print progress every N rows.
Recommended workflow
Run pt‑archiver with --no-delete to copy data first.
Verify source and archive tables match (e.g., with pt-table-sync).
Re‑run pt‑archiver without --no-delete to delete the source rows.
This three‑step approach avoids data loss, especially on legacy MySQL 5.6 instances where strict SQL mode may be disabled.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
