Skip to main content

List Partitioning Techniques: Configuration, Routing & Workflows

List partitioning maps discrete categorical values to dedicated storage segments, giving the query planner exact routing information for every INSERT and SELECT. Unlike range partitioning strategies, which divide a continuous domain at numeric or temporal boundaries, or hash routing algorithms, which scatter rows uniformly across nodes, list partitioning locks each row to a named partition based on an exact value match. This page is part of the Partitioning Implementation Patterns & Routing guide; it covers production DDL, partition pruning, hot-key mitigation, composite routing, and the failure modes engineers encounter most often.

Problem Framing

A multi-tenant SaaS platform ingests 50 million event rows per day into a single tenant_events table. Reporting queries from individual tenants scan the full table despite filtering on tenant_id, because the planner has no metadata that limits eligible pages. Index scans help at moderate data volumes but degrade past a few hundred million rows when autovacuum cannot keep pace with dead-tuple accumulation. Archival is painful: purging a churned tenant means a full sequential scan and DELETE with no easy way to drop a bounded data set atomically.

List partitioning solves all three: the planner prunes to the relevant partition, autovacuum runs on smaller per-tenant segments independently, and tenant archival becomes a single DETACH PARTITION plus DROP TABLE.

The approach works when the partition key column has a bounded, manageable set of discrete values. It struggles when values are high-cardinality unbounded strings (thousands of tenants added continuously) or when write distribution across values is heavily skewed — both of which the sections below address.


List Partitioning Routing Application writes flow through the parent table, which routes rows to named partitions based on exact value matches. A DEFAULT partition captures unmapped values. Application INSERT / SELECT tenant_events PARTITION BY LIST (tenant_id) events_acme VALUES IN ('acme') events_globex VALUES IN ('globex') events_initech VALUES IN ('initech') events_default DEFAULT fallback

Step 1 — Define the Parent Table and Partitions

Declare the parent table with PARTITION BY LIST and create one child partition per value set. Always add a DEFAULT partition; without it, any INSERT for an unmapped value raises an error immediately and halts pipelines.

-- PostgreSQL 14+: categorical routing with a DEFAULT fallback
CREATE TABLE tenant_events (
  event_id    UUID         DEFAULT gen_random_uuid(),
  tenant_id   VARCHAR(32)  NOT NULL,
  event_type  VARCHAR(50)  NOT NULL,
  payload     JSONB,
  created_at  TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
  PRIMARY KEY (event_id, tenant_id)          -- partition key must appear in PK
) PARTITION BY LIST (tenant_id);

CREATE TABLE events_acme    PARTITION OF tenant_events FOR VALUES IN ('acme');
CREATE TABLE events_globex  PARTITION OF tenant_events FOR VALUES IN ('globex');
CREATE TABLE events_initech PARTITION OF tenant_events FOR VALUES IN ('initech');
CREATE TABLE events_default PARTITION OF tenant_events DEFAULT;

Build indexes on each child partition, not on the parent. PostgreSQL propagates index definitions to new partitions created after the parent index exists, but existing partitions created before an index require explicit CREATE INDEX statements.

CREATE INDEX CONCURRENTLY ON events_acme   (created_at DESC);
CREATE INDEX CONCURRENTLY ON events_globex (created_at DESC);
CREATE INDEX CONCURRENTLY ON events_initech(created_at DESC);
CREATE INDEX CONCURRENTLY ON events_default(created_at DESC);

ORM note: SQLAlchemy’s postgresql_partition_by argument is informational metadata only — it does not emit the child partition DDL. Run the child CREATE TABLE statements via Alembic op.execute() or a manual migration. Reflect the child tables as separate mapped classes if you need direct ORM access to individual partitions.

Step 2 — Verify Partition Pruning

The planner prunes list partitions when the WHERE clause contains an equality predicate or an IN list on the partition column with no implicit casting. Confirm pruning with EXPLAIN:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT event_id, event_type, created_at
FROM tenant_events
WHERE tenant_id = 'acme'
  AND created_at > NOW() - INTERVAL '30 days';

Expected output fragment:

->  Bitmap Heap Scan on events_acme  (cost=...)
      Filter: (created_at > ...)

If the plan shows an Append node scanning all children, check for:

  • Implicit type mismatch: the literal 'acme' must match the column’s declared type exactly. A tenant_id CHAR(32) column will not prune against a VARCHAR literal in older planner versions.
  • Function wrapping: WHERE lower(tenant_id) = 'acme' defeats pruning because the planner cannot infer which partition owns lower(x) = 'acme' without evaluating every partition.
  • enable_partition_pruning: confirm SHOW enable_partition_pruning; returns on.

Step 3 — Automate Partition Creation for New Values

Detect unmapped values before application writes land in the DEFAULT partition. Query pg_inherits to enumerate existing child partitions and compare against your authoritative tenant registry:

-- List values currently covered by named (non-DEFAULT) partitions
SELECT pg_get_expr(relpartbound, oid) AS bound
FROM pg_class
WHERE oid IN (
  SELECT inhrelid FROM pg_inherits
  WHERE inhparent = 'tenant_events'::regclass
)
AND relpartbound IS NOT NULL;

A Python provisioning script that runs before each deployment:

import psycopg2, re

KNOWN_TENANTS_QUERY = "SELECT tenant_slug FROM tenants WHERE active = true"
PARTITIONS_QUERY = """
    SELECT pg_get_expr(relpartbound, oid)
    FROM pg_class
    WHERE oid IN (SELECT inhrelid FROM pg_inherits
                  WHERE inhparent = 'tenant_events'::regclass)
      AND relpartbound IS NOT NULL
      AND relkind = 'r'
"""

def provision_missing_partitions(conn_str: str) -> list[str]:
    created = []
    with psycopg2.connect(conn_str) as conn, conn.cursor() as cur:
        cur.execute(KNOWN_TENANTS_QUERY)
        tenants = {row[0] for row in cur.fetchall()}

        cur.execute(PARTITIONS_QUERY)
        existing_bounds = " ".join(row[0] or "" for row in cur.fetchall())
        covered = set(re.findall(r"'([^']+)'", existing_bounds))

        for tenant in tenants - covered:
            safe = re.sub(r"[^a-z0-9_]", "_", tenant.lower())
            ddl = (
                f"CREATE TABLE IF NOT EXISTS events_{safe} "
                f"PARTITION OF tenant_events FOR VALUES IN ('{tenant}')"
            )
            cur.execute(ddl)
            cur.execute(f"CREATE INDEX CONCURRENTLY ON events_{safe} (created_at DESC)")
            cur.execute(f"ANALYZE events_{safe}")
            created.append(tenant)
        conn.commit()
    return created

Migration steps for backfilling existing data from the DEFAULT partition:

  1. Create the new partition DDL (above).
  2. Move rows in batches to avoid long-running transactions: INSERT INTO events_newco SELECT * FROM events_default WHERE tenant_id = 'newco' LIMIT 10000; — repeat until empty.
  3. Delete migrated rows from DEFAULT: DELETE FROM events_default WHERE tenant_id = 'newco';
  4. Run ANALYZE events_newco to refresh planner statistics before enabling traffic.

Step 4 — Detect and Mitigate Hot Partitions

High-volume tenants or event types create write bottlenecks. Identify skewed partitions through pg_stat_user_tables:

SELECT
  relname                                         AS partition_name,
  pg_size_pretty(pg_total_relation_size(oid))     AS total_size,
  n_live_tup                                      AS live_rows,
  n_dead_tup                                      AS dead_rows,
  last_autovacuum::date                           AS last_vacuum
FROM pg_stat_user_tables
WHERE relname LIKE 'events_%'
ORDER BY pg_total_relation_size(oid) DESC
LIMIT 10;

When a single partition holds more than 3–5× the average size of its siblings, consider sub-partitioning by a secondary key. For a high-volume acme tenant, partition by month under the existing tenant partition:

-- Convert events_acme to a sub-partitioned table (requires recreation)
CREATE TABLE events_acme_2025_01
  PARTITION OF events_acme
  FOR VALUES IN ('acme')   -- inherit parent's list constraint
  -- then sub-partition by range:
;
-- Practical approach: create a new table partitioned by range and attach
CREATE TABLE events_acme (
  LIKE tenant_events INCLUDING ALL
) PARTITION BY RANGE (created_at);

CREATE TABLE events_acme_2025_01 PARTITION OF events_acme
  FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

See Handling Hot Keys in List Partitioned Tables for detailed I/O splitting workflows including connection-pool sharding and write fan-out strategies.

Configuration Reference

Parameter Recommended value Rationale
enable_partition_pruning on (default) Activates planner-time partition elimination for = and IN predicates. Verify it is not disabled by a global override.
constraint_exclusion partition (default) Limits constraint checking overhead to child tables without scanning parent-level check constraints.
autovacuum_vacuum_scale_factor per partition 0.010.02 Smaller scale factor triggers autovacuum earlier on high-write partitions, preventing dead-tuple accumulation that inflates table bloat and slows pruning.
parallel_workers on child tables 24 Parallel sequential scans on large individual partitions improve analytical queries on non-pruned paths. Set via ALTER TABLE events_acme SET (parallel_workers = 4).
DEFAULT partition Always present Prevents insert errors when new values arrive before partition DDL is applied. Monitor its growth as an early warning of unmapped values.
Maximum partitions per parent Under 1 000 PostgreSQL stores partition metadata in pg_inherits; very large partition counts slow planning time because the planner iterates the full inheritance tree even after pruning.

Operational Contrast

List partitioning guarantees exact-match locality: a row for tenant acme will always land in events_acme, making compliance-scoped data deletion as simple as DROP TABLE. Range partitioning strategies are better suited when queries filter on continuous values like timestamps, because the planner can prune a single partition covering a date range rather than requiring an equality match. Hash routing algorithms outperform both for write-heavy workloads across many anonymous keys (such as user UUIDs) where even distribution matters more than value-based locality.

List partitioning fits multi-tenant isolation, event type segregation, geographic region boundaries, and compliance domains (GDPR per-region data). It breaks down when the value set grows beyond a few hundred distinct values dynamically or when values are not known at schema design time — at that scale, application-level sharding with a proxy routing architecture is a stronger fit.

Failure Modes

Failure Root cause Detection Mitigation
ERROR: no partition of relation "tenant_events" found for row New categorical value arrives with no matching partition and no DEFAULT partition exists. Application error logs; alert on PostgreSQL ERROR messages containing “no partition found”. Always create a DEFAULT partition. Monitor its row count daily; provision named partitions for any value that accumulates more than 10 000 rows.
Pruning disabled — full Append scan on equality query Implicit type cast between predicate literal and partition column type, or enable_partition_pruning = off. EXPLAIN shows Append with all children listed as active. Match literal types to column declarations exactly. Re-enable enable_partition_pruning. Avoid wrapping the partition column in functions.
Planning time spikes with many partitions PostgreSQL iterates pg_inherits at plan time; 2 000+ partitions on one parent cause >100 ms overhead per query. pg_stat_statements shows high mean_plan_time for queries on the partitioned table. Consolidate low-volume values into shared partitions (e.g., one partition for all tenants with fewer than 1 000 rows). Use range partitioning for a second dimension to reduce child count.
DEFAULT partition growth without alerts New tenants or event types appear in production without triggering the provisioning script. Monitor SELECT n_live_tup FROM pg_stat_user_tables WHERE relname = 'events_default' — alert when > 50 000. Add a provisioning job to your deployment pipeline. Set up a pg_cron or external cron task to check DEFAULT size hourly and page on-call if it exceeds threshold.

Common Mistakes

  • Omitting the DEFAULT partition. Any INSERT for an unmapped value will fail with a runtime error. Make the DEFAULT partition mandatory in your DDL templates; treat its growth as a monitoring signal, not a safety valve.
  • Using OR across partition values instead of IN. WHERE tenant_id = 'acme' OR tenant_id = 'globex' may not prune as reliably as WHERE tenant_id IN ('acme', 'globex'). Use IN consistently for multi-value predicates.
  • Hardcoding partition names in application routing logic. Embedding names like events_acme in application code creates a tight coupling that silently breaks when partitions are renamed, merged, or re-partitioned. Let the database planner route through the parent table.
  • Creating global indexes on the parent table. PostgreSQL does not support global (non-partitioned) indexes on declaratively partitioned tables. Every index must be created per child partition. Missing indexes on new partitions after automated provisioning are a common post-deploy performance regression.

FAQ

When should list partitioning be preferred over range or hash partitioning?

Use list partitioning when data naturally groups into discrete, non-overlapping categories requiring exact-match routing and targeted archival — multi-tenant SaaS, event type classification, and geographic region isolation are typical examples. When temporal ordering is the primary access pattern, range partitioning is a better fit. When uniform write distribution across many keys is the goal, hash routing algorithms provide better balance.

How does partition pruning work with list partitions?

The query planner evaluates equality predicates against partition value lists at planning time. It scans only matching segments while skipping irrelevant partitions entirely, provided the predicate is a simple equality or IN expression on the partition column. Implicit type casting or OR predicates across different values often defeat pruning — always verify with EXPLAIN (ANALYZE, BUFFERS) after schema or query changes.

Can list partitions be automatically created for new categorical values?

Yes. Implement metadata polling or event-driven workflows that detect unmapped values in a staging area and execute dynamic DDL to provision new partitions without downtime. Always backfill historical rows from the DEFAULT partition in batched INSERT...SELECT statements and run ANALYZE on the new partition before enabling production traffic. The Python example in Step 3 above shows a complete provisioning pattern.


Articles in This Section