Implementing Eventual Consistency in Partitioned PostgreSQL
This guide walks through configuring logical replication, version-gated conflict resolution, and lag monitoring to achieve eventual consistency across declarative PostgreSQL partitions β a core technique covered in Consistency Models in Distributed Databases, which is part of the broader Database Partitioning Fundamentals & Architecture topic area.
Prerequisites
Before starting, verify every item below:
The diagram below maps the data flow from a partitioned publisher through logical replication to a subscriber, showing where conflict resolution and lag monitoring intercept the stream.
Step 1 β Enable Logical WAL and Size Replication Slots
Set wal_level = logical on the publisher and allocate enough slots and senders for all partition sync channels. A restart is required if wal_level changes.
# postgresql.conf β publisher node
wal_level = logical
max_replication_slots = 20 # one slot per publication; add headroom
max_wal_senders = 20 # match max_replication_slots
wal_keep_size = 512 # MB β retain WAL while subscribers catch up
SRE tip: Each inactive logical replication slot holds WAL on the publisher indefinitely. Set
max_slot_wal_keep_size(PostgreSQL 13+) to a hard ceiling (e.g.4096) to prevent disk exhaustion if a subscriber goes dark for more than a few hours.
After restarting, confirm the WAL level is active:
SHOW wal_level;
-- expected: logical
Step 2 β Scope Publications to Specific Child Partitions
Create targeted publications that cover only the child partitions you need to sync. Avoid publishing the parent table directly β PostgreSQL logical replication does not automatically route inserts on the parent to the correct child on the subscriber.
-- Publisher: create a publication scoped to specific child partitions
CREATE PUBLICATION part_sync
FOR TABLE orders_2024_q1, orders_2024_q2;
Operational note: Logical replication does not propagate DDL. Each time you add a new child partition to the publisher, you must first create the matching table on the subscriber, then run
ALTER PUBLICATION part_sync ADD TABLE orders_2024_q3;before rows in the new partition will replicate.
DBA tip: Use one publication per logical partition group (e.g. by tenant or time range) rather than one publication for the entire schema. This keeps WAL volume per slot predictable and lets you set different staleness SLAs per group without cross-contamination.
Step 3 β Initialize Subscriptions Without Blocking Writes
Create the subscription with copy_data = false to avoid a full-table copy that would block production writes during cutover. The subscriber starts streaming WAL immediately from the current LSN.
-- Subscriber: create subscription without initial data copy
CREATE SUBSCRIPTION part_sync_sub
CONNECTION 'host=publisher.internal port=5432 dbname=prod user=replicator password=***'
PUBLICATION part_sync
WITH (
copy_data = false,
create_slot = true,
slot_name = 'part_sync_slot'
);
Operational note: If the subscriber needs historical data, sync it out-of-band using
pg_dump --table orders_2024_q1 | psqlbefore creating the subscription. The subscription then only needs to replay WAL deltas from the dumpβs snapshot LSN onward.
SRE tip: For eventual consistency workloads, set
synchronous_commit = offon the publisher connection used by the subscription. This removes the round-trip durability window from the write hot path and maximises partition write throughput. Accept the small durability gap explicitly in your runbook.
Step 4 β Implement Idempotent Conflict Resolution
Asynchronous propagation guarantees out-of-order delivery during network partitions or high-throughput bursts. Without explicit conflict handling, a stale replayed WAL event can silently overwrite a newer write that arrived independently on the subscriber.
Use ON CONFLICT DO UPDATE with GREATEST() timestamp comparison or a monotonically increasing version number. Never apply a blind DO UPDATE without a guard β this is the most common cause of data divergence in eventually consistent PostgreSQL setups.
-- Idempotent upsert with version-gated conflict resolution
INSERT INTO orders (id, payload, updated_at, version)
VALUES (101, '{"status":"shipped"}', NOW(), 3)
ON CONFLICT (id) DO UPDATE
SET payload = EXCLUDED.payload,
updated_at = GREATEST(orders.updated_at, EXCLUDED.updated_at),
version = EXCLUDED.version
WHERE EXCLUDED.version > orders.version;
Operational note:
pg_replication_originlets you tag writes with their source node and skip re-applying events already seen. For multi-master topologies, pair origin tracking with the version guard above to deduplicate events that have traversed multiple hops.
DBA tip: Store
versionas abigintsequence counter incremented by the application on every write β not a timestamp. Clocks on distributed nodes drift. A version counter is strictly monotonic and eliminates theGREATEST()ambiguity window entirely for high-frequency update scenarios.
Verification
Run the following queries immediately after subscription creation to confirm the replication channel is active and lag is within your defined SLA.
-- Confirm subscription is active
SELECT subname, subenabled, subpublications
FROM pg_subscription;
-- Check replication lag per subscription
SELECT
subname,
last_msg_receive_time,
received_lsn,
pg_current_wal_lsn() AS publisher_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), received_lsn) AS apply_lag_bytes,
EXTRACT(EPOCH FROM (now() - last_msg_receive_time)) AS lag_seconds
FROM pg_stat_subscription
WHERE subname LIKE 'part_sync%';
Expected output when healthy:
subname | subenabled | ...
---------------+------------+----
part_sync_sub | t | {part_sync}
subname | apply_lag_bytes | lag_seconds
---------------+-----------------+-------------
part_sync_sub | 0 | 0.04
apply_lag_bytes = 0 and lag_seconds < 1 confirm the subscriber has caught up to the publisherβs current WAL position. If lag_seconds climbs above your staleness SLA across three consecutive checks, escalate to automated reconciliation or a circuit-breaker that routes reads back to the primary.
Periodically validate row-count parity between publisher and subscriber per partition to catch silent divergence:
-- Run on both nodes; compare counts
SELECT COUNT(*) FROM orders_2024_q1;
SELECT COUNT(*) FROM orders_2024_q2;
Failure Mode Table
| Failure mode | Root cause | SRE mitigation |
|---|---|---|
| Subscriber lag grows unbounded | Slow consumer, large transactions, or network congestion exceeding wal_keep_size |
Increase wal_keep_size; scale subscriber CPU; split high-volume partitions into separate publications with dedicated slots |
| New child partition rows silently not replicated | DDL not propagated by logical replication | Always create matching child table on subscriber first; then ALTER PUBLICATION ... ADD TABLE; verify with pg_publication_tables |
| Stale overwrite on subscriber | ON CONFLICT DO UPDATE applied without a version or timestamp guard |
Add WHERE EXCLUDED.version > table.version guard; audit existing upsert paths; enable pg_stat_statements to find blind update queries |
FAQ
How do I handle write conflicts during partition sync?
Use ON CONFLICT DO UPDATE combined with application-level version numbers or updated_at timestamps to enforce idempotent convergence. Gate every DO UPDATE clause with WHERE EXCLUDED.version > orders.version or a GREATEST() comparison to prevent replayed stale events from overwriting newer writes. Track event origin with pg_replication_origin for additional deduplication in multi-hop topologies.
Does PostgreSQL have a built-in toggle for cross-partition eventual consistency?
No. Achieving it requires combining declarative partitioning with logical replication, custom conflict resolution at the application layer, and continuous monitoring of pg_stat_subscription. Unlike consistent hash routing, which distributes writes deterministically at query time, eventual consistency relies on asynchronous WAL replay β meaning convergence time is a first-class operational metric you must define and monitor.
What replication lag is acceptable for eventual consistency?
It depends on your business SLA. Internal analytics services typically tolerate 2β5 s; user-facing inventory or order status should target under 500 ms. Define explicit thresholds per partition group before enabling async propagation, document them as operational SLAs, and alert when lag_seconds from pg_stat_subscription exceeds them across three consecutive samples. See Consistency Models in Distributed Databases for a framework to select appropriate guarantees per workload.
Related
- Consistency Models in Distributed Databases β parent page: quorum routing, anti-entropy, and drift monitoring across partitioned replicas
- Database Partitioning Fundamentals & Architecture β foundational context on partition key selection and topology design
- Step-by-Step Guide to Implementing Consistent Hash Routing β complementary routing strategy that determines which partition node receives a write before replication begins