Optimizing Cross-Partition Aggregations with Materialized Views

Q: Can a single materialized view span multiple child partitions?

Yes — an MV can SELECT from multiple child partition tables in a single query, but refresh cost grows linearly with partition count. Per-partition MVs unioned at the coordinator give better refresh granularity and lower per-refresh cost.

Q: What is the recommended refresh interval for real-time dashboards?

Use event-driven delta merges for sub-second freshness. Alternatively, schedule REFRESH MATERIALIZED VIEW CONCURRENTLY at 1–5 minute intervals, tuned to your acceptable SLA lag and write volume.

This guide shows how to deploy per-partition materialized views in PostgreSQL to eliminate the cross-shard fan-out latency that plagues distributed SUM, COUNT, and AVG queries — part of the broader Cross-Partition Querying & Aggregation Strategies approach.

Prerequisites

PostgreSQL 14 or later (for REFRESH MATERIALIZED VIEW CONCURRENTLY and ON CONFLICT support)
Declarative partitioning already configured on the target table (PARTITION BY RANGE, LIST, or HASH)
SUPERUSER or at minimum CREATE privilege on the target schema
Write access to a staging table (for the incremental delta pattern)
A proxy or connection router you can configure to redirect read traffic (PgBouncer, HAProxy, or application-level)
Baseline execution plans captured with EXPLAIN (ANALYZE, BUFFERS) before starting

Step 1 — Diagnose the fan-out bottleneck

Before building views, confirm that cross-partition fan-out is the actual bottleneck rather than a missing index or a suboptimal query shape.

-- Capture the full distributed execution plan
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
  DATE_TRUNC('day', created_at) AS day,
  SUM(amount)                   AS total_amount,
  COUNT(*)                      AS order_count
FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY 1
ORDER BY 1;

What to look for: Append nodes that fan out to every child partition, Hash Aggregate nodes on the coordinator with large Batches counts, and high shared hit / shared read buffer ratios on partitions that are outside the query’s date range (indicating missing partition pruning).

DBA tip: If the EXPLAIN output shows Parallel Seq Scan on every child partition, your partition key is not aligning with the WHERE clause predicate. Fix partition pruning first — materialized views will not help if the query ignores the partition boundary.

Step 2 — Create one materialized view per child partition

Standard PostgreSQL does not allow CREATE MATERIALIZED VIEW ... PARTITION BY .... Instead, create one view per child partition. This mirrors the partition boundary exactly, enabling index-only scans and eliminating any cross-node data movement at refresh time.

-- Q1 partition
CREATE MATERIALIZED VIEW mv_orders_2024_q1_daily AS
SELECT
  DATE_TRUNC('day', created_at) AS day,
  SUM(amount)                   AS total_amount,
  COUNT(*)                      AS order_count
FROM orders_2024_q1
GROUP BY 1
WITH DATA;

CREATE UNIQUE INDEX idx_mv_q1_day ON mv_orders_2024_q1_daily (day);

-- Q2 partition
CREATE MATERIALIZED VIEW mv_orders_2024_q2_daily AS
SELECT
  DATE_TRUNC('day', created_at) AS day,
  SUM(amount)                   AS total_amount,
  COUNT(*)                      AS order_count
FROM orders_2024_q2
GROUP BY 1
WITH DATA;

CREATE UNIQUE INDEX idx_mv_q2_day ON mv_orders_2024_q2_daily (day);

Operational note: The UNIQUE INDEX on day is required for two reasons: it enables REFRESH MATERIALIZED VIEW CONCURRENTLY (which needs a unique index to identify changed rows), and it supports the ON CONFLICT (day) DO UPDATE merge pattern in Step 3.

SRE tip: Automate this DDL. When your automated partition creation workflow provisions a new child partition, a post-creation hook should immediately create the corresponding MV and its unique index. Gaps here are a common source of silent query degradation after partition rollovers.

The diagram below shows how this architecture separates the write path (raw inserts into child partitions) from the read path (index-only scans on per-partition MVs, unioned by the coordinator):

Step 3 — Implement incremental delta refresh

Full REFRESH MATERIALIZED VIEW rebuilds the entire view from scratch, acquiring an exclusive lock (or a slower non-blocking scan with CONCURRENTLY). For high-write tables, use an append-only staging table and atomic delta merges instead.

-- Apply incremental delta from staging into the Q1 MV
BEGIN;

INSERT INTO mv_orders_2024_q1_daily (day, total_amount, order_count)
SELECT
  DATE_TRUNC('day', created_at) AS day,
  SUM(amount)                   AS total_amount,
  COUNT(*)                      AS order_count
FROM orders_staging
WHERE created_at >= '2024-01-01'
  AND created_at <  '2024-04-01'
GROUP BY 1
ON CONFLICT (day) DO UPDATE
  SET total_amount = mv_orders_2024_q1_daily.total_amount + EXCLUDED.total_amount,
      order_count  = mv_orders_2024_q1_daily.order_count  + EXCLUDED.order_count;

-- Purge processed staging rows
DELETE FROM orders_staging
WHERE created_at >= '2024-01-01'
  AND created_at <  '2024-04-01';

COMMIT;

Operational note: The ON CONFLICT ... DO UPDATE makes each delta application idempotent. If the merge transaction is retried after a network error (common in CDC-driven pipelines), the accumulated totals remain correct rather than double-counting.

SRE tip: Trigger this merge from a Kafka consumer, a CDC hook (Debezium, pglogical), or a scheduled pg_cron job — not a wall-clock cron that ignores write volume. A fixed 1-minute cron fires at the same rate during off-peak hours as during peak, wasting resources. An event-driven trigger fires proportionally to write throughput.

Step 4 — Route reads through a hybrid coordinator query

The coordinator unions per-partition MV results for historical ranges and bypasses the MV for rows within the freshness window. This is the same query-time composability that federated query execution relies on, applied here at the coordinator level rather than at a separate query engine.

-- Hybrid aggregation: MV for history, live table for the freshness window
SELECT day, SUM(total_amount) AS global_total, SUM(order_count) AS global_count
FROM (
  -- Historical segments from per-partition MVs
  SELECT day, total_amount, order_count FROM mv_orders_2024_q1_daily
  UNION ALL
  SELECT day, total_amount, order_count FROM mv_orders_2024_q2_daily

  UNION ALL

  -- Sub-second-fresh rows from the live partitioned table
  SELECT
    DATE_TRUNC('day', created_at) AS day,
    amount                        AS total_amount,
    1                             AS order_count
  FROM orders
  WHERE created_at > NOW() - INTERVAL '5 minutes'
) sub
GROUP BY day
ORDER BY day;

Operational note: Parameterise the INTERVAL '5 minutes' as a configuration value in your application layer. When MV refresh lag increases (detectable via the verification query in Step 5), widen the freshness window rather than letting stale MV rows contaminate results.

SRE tip: Wrap the coordinator query in a function that accepts p_freshness_interval INTERVAL and checks pg_stat_user_tables for the last MV vacuum timestamp before deciding whether to widen the fallback window. This avoids manual intervention during refresh lag events.

Step 5 — Verify correctness and performance

Run these queries after initial deployment and after each schema change to confirm the MV is serving the expected data and that the query planner is using index scans rather than sequential scans.

-- 1. Check the planner uses an index-only scan on the MV
EXPLAIN (ANALYZE, BUFFERS)
SELECT day, total_amount, order_count
FROM mv_orders_2024_q1_daily
WHERE day BETWEEN '2024-01-01' AND '2024-01-31';

-- Expected: "Index Only Scan using idx_mv_q1_day on mv_orders_2024_q1_daily"
-- Red flag: "Seq Scan" — rebuild the unique index if it was created before WITH DATA

-- 2. Cross-check MV totals against the live partition for a complete date range
SELECT
  'mv'   AS source, SUM(total_amount) AS total, SUM(order_count) AS cnt
FROM mv_orders_2024_q1_daily
UNION ALL
SELECT
  'live' AS source, SUM(amount), COUNT(*)
FROM orders_2024_q1;

-- Expected: both rows return identical totals (within delta-merge lag tolerance)

-- 3. Confirm refresh recency via pg_stat_user_tables
SELECT relname, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE relname LIKE 'mv_orders_%'
ORDER BY relname;

Expected output for the cross-check query:

 source | total        | cnt
--------+--------------+---------
 mv     | 4821903.22   | 198442
 live   | 4821903.22   | 198442
(2 rows)

Any divergence between mv and live rows indicates either a missed delta merge or an incomplete WITH DATA refresh. Re-run Step 3’s delta merge or issue a full REFRESH MATERIALIZED VIEW CONCURRENTLY mv_orders_2024_q1_daily to resync.

Failure mode table

Failure mode	Root cause	SRE mitigation
Coordinator returns stale totals during write bursts	Delta merge lag exceeds the freshness window; MV rows are older than `NOW() - INTERVAL '5 minutes'`	Widen the freshness window in the hybrid query, or trigger an emergency delta merge; alert on `pg_stat_user_tables.last_analyze` age exceeding SLA
`REFRESH MATERIALIZED VIEW CONCURRENTLY` fails with “index required”	The unique index on `day` was dropped or never created, so PostgreSQL cannot identify changed rows	Re-create the unique index (`CREATE UNIQUE INDEX idx_mv_q1_day ON mv_orders_2024_q1_daily (day)`) before retrying the refresh
Cross-check totals diverge after a retry storm	`ON CONFLICT DO UPDATE` accumulates double-counted deltas when staging rows are not purged atomically	Wrap the delta merge and staging `DELETE` in a single transaction; ensure the staging table has a unique constraint on `(created_at, order_id)` to prevent duplicate inserts

FAQ

How do I handle materialized view staleness during high-throughput write bursts?

Implement delta-based refresh triggers with a maximum staleness threshold. Route queries exceeding the threshold directly to live partitions via a fallback in the coordinator query. The hybrid UNION ALL pattern in Step 4 handles this automatically when you parameterise the freshness interval. For burst scenarios, reduce the freshness window to 1–2 minutes and increase the delta merge frequency rather than issuing full refreshes.

Can a single materialized view span multiple child partitions?

Yes — an MV can SELECT from multiple child partition tables in a single query, because child partitions are regular tables from the MV’s perspective. However, refresh cost grows linearly with partition count, and a failure in any partition’s scan aborts the entire refresh. Per-partition MVs unioned at the coordinator give better refresh granularity, lower per-refresh cost, and let you stagger refresh windows across partitions to smooth I/O.

What refresh interval is appropriate for real-time dashboards?

Use event-driven delta merges triggered by CDC events for sub-second freshness. If your pipeline does not yet have CDC, schedule REFRESH MATERIALIZED VIEW CONCURRENTLY at 1–5 minute intervals and set the hybrid query’s freshness window to match. Dashboards backed by proxy routing architectures can also direct dashboard traffic specifically to read replicas where MVs are refreshed on a tighter schedule without competing with write workloads.

Cross-Shard Aggregation Patterns — parent page covering scatter-gather, push-down aggregation, and the trade-offs between live and pre-aggregated query strategies
Federated Query Execution — coordinating queries across multiple independent PostgreSQL instances, including result merging and connection fanout management
Proxy Routing Architectures — configuring HAProxy and PgBouncer to route read traffic to MV-backed replicas without application changes