Skip to main content

Range Partitioning Strategies

Range partitioning divides a table into child partitions where each partition holds rows whose partition key falls within a declared lower and upper bound. As part of Partitioning Implementation Patterns & Routing, it is the dominant strategy for time-series data, sequential audit logs, and any workload where queries are naturally bounded by time or ordered numeric ranges. Compare it with hash routing algorithms for uniform write distribution and list partitioning techniques for categorical segmentation β€” each solves a distinct data-distribution problem.

Problem Framing

Consider a sensor telemetry table collecting 50 million rows per day. Within six months the table exceeds 9 billion rows. Full-table statistics queries β€” daily aggregations, anomaly windows, retention sweeps β€” scan every block even though they only need one week’s data. Index size grows unbounded, VACUUM runs take hours, and dropping old data requires expensive DELETE operations that leave dead tuples.

Range partitioning solves this precisely: the planner eliminates non-matching partitions before touching storage, cold months detach and drop in milliseconds, and per-partition statistics stay tight. The trade-off is that write-heavy workloads landing on a single partition (the current month) create a hot spot β€” a constraint that hash routing algorithms avoid by distributing writes across buckets regardless of value ordering.

Architecture Overview

The diagram below shows the lifecycle of a row entering a range-partitioned table: the declarative router checks the partition key against boundary metadata, lands the row in the correct child table, and the query planner later prunes non-matching children at execution time.

Range partitioning data flow Diagram showing how an INSERT is routed to the correct range partition and how a SELECT with a range predicate prunes non-matching partitions at query planning time. INSERT PATH Application INSERT row sensor_readings PARTITION BY RANGE Boundary check sensor_ts value readings_2024_q1 Jan 1 – Apr 1 2024 readings_2024_q2 Apr 1 – Jul 1 2024 readings_default catch-all / future QUERY PATH (pruning) Query planner evaluate predicates readings_2024_q1 SCANNED βœ“ readings_2024_q2 PRUNED βœ— readings_default PRUNED βœ—

Implementation Walkthrough

Step 1 β€” Select the Partition Key and Declare the Parent Table

Choose a column that monotonically increases or maps to your primary query predicate. Temporal columns like created_at or event_ts are optimal for OLTP/OLAP hybrid workloads. Sequential integer IDs work when queries always filter by ID range. Composite keys (tenant ID + timestamp) are preferable for multi-tenant systems where you need both pruning dimensions.

-- PostgreSQL declarative range partition β€” parent table
CREATE TABLE sensor_readings (
  id        UUID DEFAULT gen_random_uuid(),
  sensor_ts TIMESTAMPTZ NOT NULL,
  sensor_id INT         NOT NULL,
  value     FLOAT,
  PRIMARY KEY (id, sensor_ts)      -- partition key must appear in PK
) PARTITION BY RANGE (sensor_ts);

ORM note: SQLAlchemy requires __table_args__ = (PrimaryKeyConstraint('id', 'sensor_ts'),) to satisfy PostgreSQL’s partitioning constraint. Prisma and TypeORM need explicit composite PK definitions that mirror the partition key column.

Step 2 β€” Create Initial Child Partitions

Boundaries are half-open intervals: FROM is inclusive, TO is exclusive. Never leave a gap between partitions; the planner does not fill gaps automatically.

CREATE TABLE readings_2024_q1 PARTITION OF sensor_readings
  FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE readings_2024_q2 PARTITION OF sensor_readings
  FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

CREATE TABLE readings_2024_q3 PARTITION OF sensor_readings
  FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');

CREATE TABLE readings_2024_q4 PARTITION OF sensor_readings
  FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');

-- Catch-all: prevents insert failures for out-of-range rows
CREATE TABLE readings_default PARTITION OF sensor_readings DEFAULT;

Each child inherits indexes defined on the parent in PostgreSQL 11+. Create additional local indexes (e.g. on sensor_id) on the child tables if point-lookup patterns require them.

Step 3 β€” Automate Partition Creation

Manual DDL execution introduces operational risk and creates runbook debt. Deploy a cron-driven or event-triggered PL/pgSQL procedure that looks ahead and creates partitions before they are needed β€” at minimum one full interval ahead of the current period.

-- Idempotent quarterly partition creator β€” run monthly via pg_cron
CREATE OR REPLACE PROCEDURE create_next_quarter_partition()
LANGUAGE plpgsql AS $$
DECLARE
  next_start DATE;
  next_end   DATE;
  part_name  TEXT;
BEGIN
  -- Calculate the start of the quarter two quarters ahead
  next_start := date_trunc('quarter', now() + interval '3 months')::DATE;
  next_end   := (next_start + interval '3 months')::DATE;
  part_name  := 'readings_' || to_char(next_start, 'YYYY"_q"Q');

  IF NOT EXISTS (SELECT 1 FROM pg_class WHERE relname = part_name) THEN
    EXECUTE format(
      'CREATE TABLE %I PARTITION OF sensor_readings FOR VALUES FROM (%L) TO (%L)',
      part_name, next_start, next_end
    );
    RAISE NOTICE 'Created partition %', part_name;
  ELSE
    RAISE NOTICE 'Partition % already exists, skipping', part_name;
  END IF;
END $$;

-- Schedule via pg_cron (requires pg_cron extension)
SELECT cron.schedule('create-quarterly-partition', '0 0 1 * *', 'CALL create_next_quarter_partition()');

For a full walkthrough of this automation pattern, see Automated Partition Creation Workflows.

Step 4 β€” Detach and Archive Cold Partitions

Use DETACH PARTITION CONCURRENTLY (PostgreSQL 14+) to remove cold data without a full table lock. After detachment, move the table to a cold schema or drop it entirely.

-- Non-blocking detach (PostgreSQL 14+)
ALTER TABLE sensor_readings
  DETACH PARTITION readings_2024_q1 CONCURRENTLY;

-- Move to archival schema instead of dropping
ALTER TABLE readings_2024_q1 SET SCHEMA archive;

-- Or drop immediately if retention policy allows
DROP TABLE readings_2024_q1;

Schedule detach operations during low-traffic windows. Include a dry-run step that reports the rows and size affected before committing.

Step 5 β€” Verify Partition Pruning

Pruning relies on the query planner resolving partition boundaries at planning time. Confirm pruning with EXPLAIN (ANALYZE, BUFFERS) after every boundary or statistics change.

-- This predicate enables pruning β€” explicit range on the partition key
EXPLAIN (ANALYZE, BUFFERS)
SELECT sensor_id, avg(value)
FROM sensor_readings
WHERE sensor_ts >= '2024-01-15'
  AND sensor_ts  < '2024-02-01'
GROUP BY sensor_id;

A pruned plan shows a scan directly on readings_2024_q1 without an Append node touching other partitions. If you see a full Append, check for type mismatches, timezone coercions, or function wrappers on the partition column.

-- Avoid: function wrapping prevents pruning
WHERE DATE(sensor_ts) = '2024-01-15'   -- full scan

-- Correct: explicit range predicate
WHERE sensor_ts >= '2024-01-15' AND sensor_ts < '2024-01-16'

Configuration Reference

Parameter Recommended value Rationale
enable_partition_pruning on (default since PG 11) Enables static pruning at planning time
constraint_exclusion partition Limits constraint checks to partition tables, avoids overhead on regular tables
Partition interval 1 month for high-volume; 1 quarter for lower-volume Balances child-table overhead against pruning granularity
Look-ahead creation 2 intervals minimum Prevents insert failures during deployment windows
Autovacuum scale_factor 0.01 per child partition High-write partitions need more frequent vacuuming than the table default
ANALYZE schedule After each bulk load Keeps cardinality estimates accurate for the planner

Override autovacuum thresholds per-partition when children have very different row velocities:

ALTER TABLE readings_2024_q3 SET (
  autovacuum_vacuum_scale_factor   = 0.01,
  autovacuum_analyze_scale_factor  = 0.005,
  autovacuum_vacuum_cost_delay     = 2
);

Operational Contrast with Sibling Strategies

Range partitioning excels at ordered access and time-bounded queries, but concentrates writes on the current partition. Compare:

  • Hash routing algorithms distribute writes uniformly across fixed-count buckets using a hash function. There is no hot-partition write problem, but range queries scatter across all buckets and partition dropping requires data migration rather than a simple detach.
  • List partitioning techniques map discrete categorical values (region codes, status enums, tenant IDs) to specific partitions. Pruning is exact for equality predicates but does not accelerate range scans.

Choose range when: data arrives ordered by the partition key, queries are predominantly time-windowed, and you need cheap data expiry via partition detach. Choose hash when writes must spread evenly and queries access data by point lookup rather than range.

Failure Modes

1. Missing Partition at Insert Time

Root cause: The automation job failed to create the next interval partition before the current period rolled over. Inserts land in the default partition or fail if no default exists.

Detection:

-- Find rows in the default catch-all partition
SELECT count(*) FROM readings_default;

-- Check for recent cron job failures
SELECT jobid, jobname, status, start_time, return_message
FROM cron.job_run_details
WHERE start_time > now() - interval '7 days'
ORDER BY start_time DESC;

Mitigation: Always create a default partition. Set up alerting on readings_default row count exceeding zero. After fixing the automation, move rows out of the default partition using INSERT INTO ... SELECT into the correct child, then delete from the default.

2. Partition Pruning Not Activating

Root cause: The WHERE clause wraps the partition column in a function (DATE(sensor_ts), EXTRACT(...), CAST(...)), preventing the planner from comparing predicates to partition boundaries at planning time.

Detection:

EXPLAIN (ANALYZE, FORMAT TEXT)
SELECT * FROM sensor_readings WHERE DATE(sensor_ts) = '2024-01-15';
-- Look for "Append" scanning all children β€” pruning is not happening

Mitigation: Rewrite predicates to use explicit range comparisons on the raw partition column. Update application query builders and ORMs that may silently add function wrappers.

3. Partition Skew from Irregular Data Velocity

Root cause: Ingestion rate varies significantly across time periods (e.g. a product launch month generating 10Γ— normal volume). One partition grows far larger than its siblings, causing index bloat and slow vacuums.

Detection:

SELECT
  c.relname                                          AS partition_name,
  pg_size_pretty(pg_total_relation_size(c.oid))      AS total_size,
  c.reltuples::bigint                                AS estimated_rows
FROM pg_inherits i
JOIN pg_class c ON i.inhrelid = c.oid
WHERE i.inhparent = 'sensor_readings'::regclass
ORDER BY pg_total_relation_size(c.oid) DESC;

Mitigation: Sub-partition the oversized child by hash on a secondary column, or reduce the interval granularity (e.g. switch from quarterly to monthly). Increase autovacuum frequency on the large child as described in the configuration reference above.

4. Lock Contention During Concurrent DDL

Root cause: DETACH PARTITION without CONCURRENTLY acquires ACCESS EXCLUSIVE on the parent table, blocking all reads and writes for the duration of the detach.

Detection: Monitor pg_stat_activity for queries waiting on relation lock type against the parent table during maintenance windows.

Mitigation: Use DETACH PARTITION CONCURRENTLY (PostgreSQL 14+). If on an older version, schedule detach operations in a maintenance window and hold an application-level feature flag to pause writes to the parent table during the operation.

Common Mistakes

  • Overlapping boundaries: Causes insert failures and routing ambiguity. Validate DDL in staging using SELECT * FROM pg_inherits JOIN pg_constraint ... to confirm non-overlapping ranges before applying to production.
  • Implicit type casting in predicates: Casting the partition column inside a function (e.g. to_char(sensor_ts, ...)) disables pruning silently. Audit every reporting query that touches partitioned tables.
  • No look-ahead in automation: Creating the next partition exactly as the current one fills leaves a race window. Always create at least two future intervals ahead of the current period.
  • Forgetting per-child autovacuum tuning: High-write current partitions need tighter autovacuum settings than cold historical ones. The parent table’s autovacuum settings do not propagate to children automatically.

FAQ

When should range partitioning replace hash partitioning?

Use range for time-series data, sequential logs, and range-bound analytical queries where temporal pruning and ordered archival are the dominant access patterns. Hash routing algorithms are preferable when writes must distribute uniformly and queries access data by point lookup rather than range β€” for example, a user-session store keyed on session ID with no time-bounded scan requirement.

How do I prevent partition bloat during high-write periods?

Implement interval-based auto-creation at least two periods ahead of the current boundary, monitor partition sizes via the system catalog query in the failure modes section above, and offload cold data to archival storage using DETACH PARTITION CONCURRENTLY. Tune autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor on the active partition to values smaller than the table default (e.g. 0.01 and 0.005 respectively) so dead tuples are reclaimed more aggressively.

Can I combine range partitioning with a secondary partitioning dimension?

Yes. PostgreSQL supports sub-partitioning: define a child table as itself PARTITION BY HASH (sensor_id) or PARTITION BY LIST (region). This routes writes across multiple leaf-level tables within each time range, eliminating the single-partition write hotspot while preserving time-bounded pruning. The trade-off is a larger child-table count and more complex automation DDL. For multi-tenant deployments, a composite key of (tenant_id, event_ts) partitioned first by list on tenant_id and sub-partitioned by range on event_ts is a common pattern covered in use case mapping for partition strategies.