Databases 19 min read

Master MySQL Data Archiving with pt-archiver: Install, Run, and Optimize

This guide explains how to use Percona Toolkit's pt-archiver for MySQL data archiving, covering installation, basic and bulk commands, underlying two‑phase‑commit logic, performance trade‑offs of various parameters, and best‑practice tips for safe, efficient archiving and deletion.

dbaplus Community

Jul 26, 2022

Master MySQL Data Archiving with pt-archiver: Install, Run, and Optimize

What is pt-archiver?

pt-archiver is a Percona Toolkit utility that copies rows from a source MySQL instance to a destination (archive) instance and optionally deletes the source rows.

Installation

Download the latest Percona Toolkit binary (e.g., version 3.3.1) from https://www.percona.com/downloads/percona-toolkit/LATEST/ and install the required Perl modules:

# cd /usr/local/
wget https://downloads.percona.com/downloads/percona-toolkit/3.3.1/binary/tarball/percona-toolkit-3.3.1_x86_64.tar.gz --no-check-certificate
tar xvf percona-toolkit-3.3.1_x86_64.tar.gz
cd percona-toolkit-3.3.1
yum install perl-ExtUtils-MakeMaker perl-DBD-MySQL perl-Digest-MD5
perl Makefile.PL
make
make install

Simple archiving example

Archive the employees.departments table from host 192.168.244.10 to host 192.168.244.128:

pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
  --dest h=192.168.244.128,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
  --where "1=1"

Key options:

--source : DSN of the source instance.

--dest : DSN of the destination instance.

--where : Row‑selection condition; "1=1" selects the whole table.

Implementation principle

pt-archiver uses a two‑phase‑commit pattern per row:

Select one row from the source (using the primary‑key index).

Insert the row into the destination.

Delete the row from the source only after the insert succeeds.

This guarantees no data loss. Example General Log excerpt:

SELECT `dept_no`,`dept_name` FROM `employees`.`departments` WHERE (1=1) ORDER BY `dept_no` LIMIT 1;
INSERT INTO `employees`.`departments`(`dept_no`,`dept_name`) VALUES ('d001','Marketing');
DELETE FROM `employees`.`departments` WHERE (`dept_no`='d001');
COMMIT;

Bulk archiving

For large tables, bulk options reduce the number of commits:

pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
  --dest h=192.168.244.128,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
  --where "1=1" \
  --bulk-delete --limit 1000 --commit-each --bulk-insert

Important bulk parameters:

--bulk-delete : Delete rows in batches.

--limit N : Process N rows per batch.

--commit-each : One COMMIT per batch.

--bulk-insert : Load data into the destination with LOAD DATA INFILE for higher throughput.

Performance comparison

A benchmark on 200 k rows shows the fastest configuration is the full bulk mode ( --bulk-delete --limit 1000 --commit-each --bulk-insert). The default single‑row mode is the slowest. Adding --bulk-delete alone speeds up deletes but does not reduce commit overhead unless combined with --commit-each or --bulk-insert.

Common use cases

Safe bulk delete

Delete rows without archiving:

pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
  --where "1=1" --bulk-delete --limit 1000 --commit-each --purge --primary-key-only

--purge

disables insertion; --primary-key-only limits the SELECT to the primary key.

Archiving to files

Export rows as CSV files for later loading:

pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
  --where "1=1" --bulk-delete --limit 1000 --file '/tmp/%Y-%m-%d-%D.%t'

Place‑holders: %Y year, %m month, %D database name, %t table name.

Avoiding replication lag

Pause archiving when replica lag exceeds a threshold (default 1 s) using --check-slave-lag:

pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments \
  --where "1=1" --bulk-delete --limit 1000 --commit-each --primary-key-only --purge \
  --check-slave-lag h=192.168.244.20,P=3306,u=pt_user,p=pt_pass

Specify the option multiple times to monitor several replicas.

Frequently used parameters

--analyze : Run ANALYZE TABLE on source (s), destination (d), or both after archiving (e.g., --analyze ds).

--optimize : Run OPTIMIZE TABLE similarly.

--charset : Set client connection charset (default latin1 before MySQL 8.0, utf8mb4 in 8.0).

--[no]check-charset : Verify that connection charset matches table charset.

--[no]check-columns : Ensure source and destination tables have matching column names.

--columns : Explicitly list columns to archive.

--dry-run : Show the SQL that would be executed without running it.

--ignore : Use INSERT IGNORE to skip duplicate‑key errors.

--no-delete : Archive without deleting source rows.

--replace : Use REPLACE instead of INSERT.

--[no]safe-auto-increment : By default, pt‑archiver protects the highest auto‑increment value; disable with --no-safe-auto-increment if you need to delete it.

--progress N : Print progress every N rows.

Recommended workflow

Run pt‑archiver with --no-delete to copy data first.

Verify source and archive tables match (e.g., with pt-table-sync).

Re‑run pt‑archiver without --no-delete to delete the source rows.

This three‑step approach avoids data loss, especially on legacy MySQL 5.6 instances where strict SQL mode may be disabled.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL MySQL data archiving Replication Lag pt-archiver Percona Toolkit bulk operations

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.