Unlocking PostgreSQL Logical Replication: Principles, Challenges, and Cloud Migration
This article explains PostgreSQL logical replication concepts, key components like replication slots and output plugins, common challenges such as failover slots, DDL sync, and bidirectional loops, and demonstrates practical full‑and‑incremental data migration to the cloud.
Concept and Principles
Logical replication is a method that copies data and its changes based on replication identifiers, describing transactions and tuples rather than physical page operations.
Physical replication works on file blocks, while logical replication describes tuples.
Key Concepts
Replication slot : records replication state, prevents premature removal of WAL logs needed for logical decoding, creates a slot with an output plugin and a snapshot.
Output plugin : decodes WAL logs into readable formats; common plugins include
test_decoding,
pgoutput(default), and
wal2json. Custom plugins can be written via callback functions.
Replication protocol and messages : used to fetch WAL streams, e.g.,
psql "dbname=postgres replication=database",
START_REPLICATION SLOT slot_name [PHYSICAL] XXX/XXX [TIMELINE tli].
Workflow
After understanding concepts, the parsing workflow involves reordering WAL entries into a buffer, grouping by transaction ID, sending COMMIT to the output plugin, which then streams messages to the target.
Problems and Evolution
Problem 1: Failover slot
In HA setups, a standby may lack the replication slot after failover because slot files are not synchronized. Manual steps: create slot on primary, copy slot files to standby, restart standby, and periodically advance slot using
pg_replication_slot_advance.
Problem 2: DDL synchronization
Native logical replication does not decode DDL. Use event triggers to capture schema changes, store them in a DDL_RECORD table, and replicate that table; the subscriber can reconstruct and execute the DDL statements.
Problem 3: Bidirectional sync
Bidirectional pipelines can cause WAL loops. Instead of auxiliary tables, PostgreSQL provides origin tracking via
pg_replication_origin_session_setupor
replorigin_create, and plugins can filter by origin ID using callbacks.
Other Issues
Toast handling: use placeholders for toast values.
Heartbeat tables: keep XMIN advancing when no table updates.
Large transaction latency: PG14 streaming mode parses transactions while they are in progress.
Application and Practice
Full and Incremental Sync
Typical migration requires both full and incremental sync. The process: create replication slot and export snapshot, perform full data migration based on snapshot, then stream incremental changes via the slot. PG or MQ can act as data proxies, allowing parallel processing and pre‑processing.
Self‑Hosted Instance Migration to Cloud
A case study moves a self‑hosted PostgreSQL 10 instance to JD Cloud RDS PostgreSQL 11, using incremental back‑flow and data validation to ensure safe and smooth cut‑over.
DTS application stages:
Data check: primary keys, permissions, configuration.
Data migration: schema, bulk data, incremental sync, monitoring.
Application migration: domain switch, traffic routing.
Rollback: incremental back‑flow, quick recovery if needed.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.