Skip to main content

Implementing Eventual Consistency in Partitioned PostgreSQL

This guide walks through configuring logical replication, version-gated conflict resolution, and lag monitoring to achieve eventual consistency across declarative PostgreSQL partitions β€” a core technique covered in Consistency Models in Distributed Databases, which is part of the broader Database Partitioning Fundamentals & Architecture topic area.

Prerequisites

Before starting, verify every item below:


The diagram below maps the data flow from a partitioned publisher through logical replication to a subscriber, showing where conflict resolution and lag monitoring intercept the stream.

Eventual consistency data flow across partitioned PostgreSQL Publisher node with child partitions emits WAL events through a logical replication slot to a subscriber node. The subscriber applies idempotent upserts with version guards. A monitoring layer reads pg_stat_subscription lag metrics and triggers reconciliation when thresholds are exceeded. Publisher Node orders_2024_q1 (child partition) orders_2024_q2 (child partition) WAL (logical) replication slot async WAL Network / TCP streaming Subscriber Node Idempotent upsert ON CONFLICT + version guard Replicated partitions (eventually consistent) pg_stat_subscription lag monitoring reconciliation trigger when lag > SLA threshold

Step 1 β€” Enable Logical WAL and Size Replication Slots

Set wal_level = logical on the publisher and allocate enough slots and senders for all partition sync channels. A restart is required if wal_level changes.

# postgresql.conf β€” publisher node
wal_level = logical
max_replication_slots = 20        # one slot per publication; add headroom
max_wal_senders       = 20        # match max_replication_slots
wal_keep_size         = 512       # MB β€” retain WAL while subscribers catch up

SRE tip: Each inactive logical replication slot holds WAL on the publisher indefinitely. Set max_slot_wal_keep_size (PostgreSQL 13+) to a hard ceiling (e.g. 4096) to prevent disk exhaustion if a subscriber goes dark for more than a few hours.

After restarting, confirm the WAL level is active:

SHOW wal_level;
-- expected: logical

Step 2 β€” Scope Publications to Specific Child Partitions

Create targeted publications that cover only the child partitions you need to sync. Avoid publishing the parent table directly β€” PostgreSQL logical replication does not automatically route inserts on the parent to the correct child on the subscriber.

-- Publisher: create a publication scoped to specific child partitions
CREATE PUBLICATION part_sync
  FOR TABLE orders_2024_q1, orders_2024_q2;

Operational note: Logical replication does not propagate DDL. Each time you add a new child partition to the publisher, you must first create the matching table on the subscriber, then run ALTER PUBLICATION part_sync ADD TABLE orders_2024_q3; before rows in the new partition will replicate.

DBA tip: Use one publication per logical partition group (e.g. by tenant or time range) rather than one publication for the entire schema. This keeps WAL volume per slot predictable and lets you set different staleness SLAs per group without cross-contamination.

Step 3 β€” Initialize Subscriptions Without Blocking Writes

Create the subscription with copy_data = false to avoid a full-table copy that would block production writes during cutover. The subscriber starts streaming WAL immediately from the current LSN.

-- Subscriber: create subscription without initial data copy
CREATE SUBSCRIPTION part_sync_sub
  CONNECTION 'host=publisher.internal port=5432 dbname=prod user=replicator password=***'
  PUBLICATION part_sync
  WITH (
    copy_data   = false,
    create_slot = true,
    slot_name   = 'part_sync_slot'
  );

Operational note: If the subscriber needs historical data, sync it out-of-band using pg_dump --table orders_2024_q1 | psql before creating the subscription. The subscription then only needs to replay WAL deltas from the dump’s snapshot LSN onward.

SRE tip: For eventual consistency workloads, set synchronous_commit = off on the publisher connection used by the subscription. This removes the round-trip durability window from the write hot path and maximises partition write throughput. Accept the small durability gap explicitly in your runbook.

Step 4 β€” Implement Idempotent Conflict Resolution

Asynchronous propagation guarantees out-of-order delivery during network partitions or high-throughput bursts. Without explicit conflict handling, a stale replayed WAL event can silently overwrite a newer write that arrived independently on the subscriber.

Use ON CONFLICT DO UPDATE with GREATEST() timestamp comparison or a monotonically increasing version number. Never apply a blind DO UPDATE without a guard β€” this is the most common cause of data divergence in eventually consistent PostgreSQL setups.

-- Idempotent upsert with version-gated conflict resolution
INSERT INTO orders (id, payload, updated_at, version)
VALUES (101, '{"status":"shipped"}', NOW(), 3)
ON CONFLICT (id) DO UPDATE
  SET payload    = EXCLUDED.payload,
      updated_at = GREATEST(orders.updated_at, EXCLUDED.updated_at),
      version    = EXCLUDED.version
WHERE EXCLUDED.version > orders.version;

Operational note: pg_replication_origin lets you tag writes with their source node and skip re-applying events already seen. For multi-master topologies, pair origin tracking with the version guard above to deduplicate events that have traversed multiple hops.

DBA tip: Store version as a bigint sequence counter incremented by the application on every write β€” not a timestamp. Clocks on distributed nodes drift. A version counter is strictly monotonic and eliminates the GREATEST() ambiguity window entirely for high-frequency update scenarios.

Verification

Run the following queries immediately after subscription creation to confirm the replication channel is active and lag is within your defined SLA.

-- Confirm subscription is active
SELECT subname, subenabled, subpublications
FROM pg_subscription;

-- Check replication lag per subscription
SELECT
  subname,
  last_msg_receive_time,
  received_lsn,
  pg_current_wal_lsn()                                              AS publisher_lsn,
  pg_wal_lsn_diff(pg_current_wal_lsn(), received_lsn)              AS apply_lag_bytes,
  EXTRACT(EPOCH FROM (now() - last_msg_receive_time))               AS lag_seconds
FROM pg_stat_subscription
WHERE subname LIKE 'part_sync%';

Expected output when healthy:

 subname       | subenabled | ...
---------------+------------+----
 part_sync_sub | t          | {part_sync}

 subname       | apply_lag_bytes | lag_seconds
---------------+-----------------+-------------
 part_sync_sub |               0 |        0.04

apply_lag_bytes = 0 and lag_seconds < 1 confirm the subscriber has caught up to the publisher’s current WAL position. If lag_seconds climbs above your staleness SLA across three consecutive checks, escalate to automated reconciliation or a circuit-breaker that routes reads back to the primary.

Periodically validate row-count parity between publisher and subscriber per partition to catch silent divergence:

-- Run on both nodes; compare counts
SELECT COUNT(*) FROM orders_2024_q1;
SELECT COUNT(*) FROM orders_2024_q2;

Failure Mode Table

Failure mode Root cause SRE mitigation
Subscriber lag grows unbounded Slow consumer, large transactions, or network congestion exceeding wal_keep_size Increase wal_keep_size; scale subscriber CPU; split high-volume partitions into separate publications with dedicated slots
New child partition rows silently not replicated DDL not propagated by logical replication Always create matching child table on subscriber first; then ALTER PUBLICATION ... ADD TABLE; verify with pg_publication_tables
Stale overwrite on subscriber ON CONFLICT DO UPDATE applied without a version or timestamp guard Add WHERE EXCLUDED.version > table.version guard; audit existing upsert paths; enable pg_stat_statements to find blind update queries

FAQ

How do I handle write conflicts during partition sync?

Use ON CONFLICT DO UPDATE combined with application-level version numbers or updated_at timestamps to enforce idempotent convergence. Gate every DO UPDATE clause with WHERE EXCLUDED.version > orders.version or a GREATEST() comparison to prevent replayed stale events from overwriting newer writes. Track event origin with pg_replication_origin for additional deduplication in multi-hop topologies.

Does PostgreSQL have a built-in toggle for cross-partition eventual consistency?

No. Achieving it requires combining declarative partitioning with logical replication, custom conflict resolution at the application layer, and continuous monitoring of pg_stat_subscription. Unlike consistent hash routing, which distributes writes deterministically at query time, eventual consistency relies on asynchronous WAL replay β€” meaning convergence time is a first-class operational metric you must define and monitor.

What replication lag is acceptable for eventual consistency?

It depends on your business SLA. Internal analytics services typically tolerate 2–5 s; user-facing inventory or order status should target under 500 ms. Define explicit thresholds per partition group before enabling async propagation, document them as operational SLAs, and alert when lag_seconds from pg_stat_subscription exceeds them across three consecutive samples. See Consistency Models in Distributed Databases for a framework to select appropriate guarantees per workload.