Skip to main content

Use Case Mapping for Partition Strategies

Matching a partition strategy to the wrong workload is the most common cause of production performance regressions in scaled databases. This guide sits within Database Partitioning Fundamentals & Architecture and focuses on the decision layer between theory and implementation: how to read application signals, translate them into a strategy choice, and wire up routing logic that holds under operational pressure. For the conceptual boundary between node-local and distributed approaches, see Sharding vs Partitioning: Core Concepts.

Problem Framing

A 50-million-row orders table with chronological writes and date-range reporting queries behaves nothing like a 200-million-row user-events table with random entity-level reads. Selecting a strategy before quantifying these differences leads to predictable failures: a hash-partitioned time-series table loses its archival advantage entirely; a range-partitioned write-heavy table develops a right-edge hotspot that saturates a single shard while the rest idle.

The three signals that determine strategy are: access pattern shape (random vs. sequential vs. tenant-scoped), write distribution (uniform vs. skewed toward a hot key range), and query predicate structure (whether WHERE clauses align with a candidate partition key). Identify all three before writing a CREATE TABLE ... PARTITION BY statement.

Architecture Overview

The diagram below shows how the four core strategy types map to distinct routing paths and storage layouts. Understanding which path your workload follows determines the correct strategy before touching any schema.

Partition Strategy Decision Flow A flow diagram showing how workload signals (access pattern, write distribution, query predicate) route to four partition strategies: hash, range, list, and composite, each with their storage and routing characteristics. Workload Signals β€’ Access pattern shape β€’ Write distribution β€’ Query predicate type β€’ Tenant / entity scope Predicate type? Hash Random entity lookup Uniform write distribution Range Chronological / ordered queries + archival List Tenant / categorical scoped access Composite Mixed: analytical joins + temporal pruning N equal-sized shards Consistent hash ring No pruning; scatter-gather reads Ordered partition table Detach-to-archive pattern Planner prunes by key range Per-tenant partition Isolated index + vacuum Compliance-safe data locality Hash sub-partition within range boundary Double-key routing required Routing Middleware Layer Hashes entity key β†’ partition index | Reads partition boundary config | Directs query to correct connection pool endpoint

Step 1 β€” Quantify Workload Signals

Before any schema decision, extract three numbers from production query logs:

  1. Write QPS per key prefix β€” run a 30-day slow-query analysis grouped by the first segment of each candidate partition key. A distribution where any single prefix accounts for more than 40% of total writes signals a hotspot-prone key.
  2. Predicate alignment rate β€” of all SELECT queries touching the candidate table, what percentage include an equality or range condition on the candidate key? Below 70% means most reads will scan multiple partitions regardless of strategy.
  3. Cross-entity join frequency β€” how often do queries join the candidate table to a second table on a different key? High cross-partition join rates favor composite strategies or materialized aggregates over simple hash routing.
-- PostgreSQL: estimate predicate alignment for a candidate key column
SELECT
  calls,
  round(total_exec_time::numeric / calls, 2) AS avg_ms,
  query
FROM pg_stat_statements
WHERE query ILIKE '%orders%'
  AND query NOT ILIKE '%pg_%'
ORDER BY total_exec_time DESC
LIMIT 20;

Use the output to classify each query as aligned (uses the candidate key in WHERE) or unaligned (full-partition scan). An alignment rate below 70% signals the key is wrong, not the strategy.

Step 2 β€” Select the Strategy

Map the three quantified signals to a strategy using the reference below. Each row represents a real production scenario, not a textbook archetype.

Workload Signal Access Pattern Recommended Strategy Rationale
High write QPS, uniform entity IDs Random entity lookup, no ordering Hash Eliminates hotspots; balances I/O uniformly across shards
Chronological ingestion (IoT, logs, orders) Time-range queries; archival needed Range Enables efficient partition pruning and native DETACH PARTITION archival
Multi-tenant SaaS, tenant-scoped compliance Queries always include tenant_id List Guarantees data locality; enables per-tenant vacuum, index, and backup
Analytical platform with time + entity joins Aggregation across entities within time windows Composite (hash + range) Balances write distribution with temporal pruning for aggregation queries

If your workload crosses two rows β€” for example, a SaaS product with per-tenant chronological event logs β€” select the strategy that satisfies your primary SLA constraint first, then use sub-partitioning for the secondary signal.

Step 3 β€” Configure Application-Level Routing

Routing logic must be deterministic and co-located with the connection pool initialization. Non-deterministic routing causes the same entity’s writes to land on different partitions, producing duplicate rows and silent data corruption that only surfaces during cross-partition reads.

The snippet below shows a complete hash routing implementation in TypeScript. The key property is that the partition index is computed purely from the entity ID β€” no external state, no lookup table round-trips on the hot path.

import { createHash } from 'crypto';

interface ShardPool {
  id: number;
  connectionString: string;
}

/**
 * Deterministic hash routing: given an entity ID and shard pool,
 * returns the connection string for the owning partition.
 * Must be called BEFORE opening a transaction to avoid split-write bugs.
 */
export function resolvePartition(entityId: string, shards: ShardPool[]): string {
  const hashHex = createHash('sha256').update(entityId).digest('hex');
  // Use the first 8 hex chars (32-bit range) to compute the shard index
  const index = parseInt(hashHex.slice(0, 8), 16) % shards.length;
  return shards[index].connectionString;
}

// Example: 4 shards, entity "user_8821a" always routes to the same shard
const pool: ShardPool[] = [
  { id: 0, connectionString: 'postgres://db-0/app' },
  { id: 1, connectionString: 'postgres://db-1/app' },
  { id: 2, connectionString: 'postgres://db-2/app' },
  { id: 3, connectionString: 'postgres://db-3/app' },
];
const target = resolvePartition('user_8821a', pool);

For range-based routing, replace the hash with a boundary lookup. Store boundaries in a small config table or in-process cache to avoid a metadata query on every request. For list partitioning techniques in multi-tenant systems, the tenant ID is itself the routing key β€” no hashing required.

For proxy routing architectures that centralize this logic outside the application, the same determinism rule applies: the proxy must embed the boundary config rather than querying it on the critical path.

Step 4 β€” ORM and Query Planner Integration

ORM defaults are almost always wrong for partitioned schemas. Two settings cause the most incidents:

Eager relation loading across partitions. In Prisma, include: { orders: true } on a User model will issue a cross-partition scan if orders is partitioned by tenant_id and the User query does not carry that predicate. Disable automatic includes at the ORM layer and issue explicit scoped queries.

Missing partition key in WHERE clauses. The query planner in PostgreSQL, MySQL 8, and CockroachDB will only prune partitions when the WHERE clause contains a condition on the partition key that the planner can evaluate at plan time. Runtime-computed predicates (e.g., from a subquery result) often suppress pruning. Always verify with EXPLAIN (ANALYZE, BUFFERS):

-- Verify partition pruning is active: look for "Partitions excluded" in the output
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT order_id, amount, created_at
FROM orders
WHERE tenant_id = 'acme-corp'
  AND created_at BETWEEN '2025-01-01' AND '2025-03-31';

-- Expected output excerpt:
-- Partitions selected: 1 (of 24 total)
-- Partitions excluded: 23

If the planner selects all partitions, the partition key is not in the predicate, or the predicate is wrapped in a cast or function that prevents static evaluation.

For cross-partition queries that cannot be avoided β€” for example, a global reporting aggregate β€” use materialized views refreshed on a schedule rather than scatter-gather reads at query time. The cross-shard aggregation patterns page covers materialized aggregate design in detail.

Configuration Reference

Parameter Recommended Value Rationale
max_connections per shard (physical_RAM_GB Γ— 25) Prevents connection storm during rebalancing when connections momentarily double
constraint_exclusion (PostgreSQL) partition (not on) Enables planner pruning for inheritance tables without scanning non-partitioned tables
enable_partition_pruning on Must be on; some managed platforms set it off to reduce planning time on small tables
Hash modulus Power of 2 (4, 8, 16, 32) Simplifies rebalancing: doubling shard count moves exactly half of each existing shard’s data
Partition boundary granularity Monthly for <1TB/year; weekly for >1TB/year Coarser boundaries reduce planner overhead; finer boundaries improve archival resolution
idle_in_transaction_session_timeout 30s Kills long-running distributed transactions that hold cross-shard locks

Operational Contrast with Sibling Approaches

The strategies on this page govern where data lives at creation time. Two related concerns require separate designs.

Automated partition creation workflows handle the operational problem of pre-creating future partition boundaries before data arrives β€” a hash strategy with 16 fixed shards never needs this, but a monthly range partition table requires a scheduled job to create next month’s partition before the calendar rolls over. Missing that job causes all writes to fail with a partition boundary violation.

Consistency models in distributed databases address read-after-write guarantees once data is distributed. A hash strategy with synchronous replication and W+R > N quorum produces strong consistency; the same strategy with asynchronous replication produces eventual consistency. The strategy choice here does not determine consistency β€” replication configuration does.

Hash routing algorithms go deeper into consistent hashing ring design, virtual node placement, and rebalancing without full re-hashing, which is relevant once the basic routing from this page is working.

Failure Modes

Hot partition due to monotonic key selection. Using an auto-increment integer or timestamp as a hash input produces the same partition index for all concurrent writes when those values are sequential. The hash distributes across the range of values, not across time β€” so writes in a burst all go to the same bucket.

  • Detection: SELECT partition_name, pg_size_pretty(pg_total_relation_size(partition_name::regclass)) FROM pg_partitions WHERE tablename = 'events' ORDER BY 2 DESC β€” top partition 3Γ— or more larger than the mean.
  • Mitigation: prepend a random salt prefix to the entity key before hashing, or switch to a composite key that includes a low-cardinality categorical dimension.

Cross-partition transaction deadlock under 2PC. A saga that writes to two partitions in non-deterministic order can deadlock when two concurrent transactions acquire locks in opposite order.

  • Detection: pg_stat_activity showing waiting state with lock type = relation across sessions holding locks on different partitions.
  • Mitigation: enforce a global lock-acquisition order (lowest partition ID first) in all saga implementations. Add idempotency keys so failed transactions can be retried without re-acquiring the same locks.

Partition pruning disabled by implicit cast. A predicate like WHERE created_at = $1::text::timestamp wraps the comparison in a type cast that the planner cannot evaluate at plan time, causing it to scan all partitions.

  • Detection: EXPLAIN output shows Partitions selected: N where N equals the total partition count.
  • Mitigation: pass the parameter with the correct type from the driver; never cast partition key predicates.

Zero-downtime key migration failure. Dual-write patterns that write to both old and new schemas without a consistency check between them leave the new schema silently behind when background sync jobs fall behind under load.

  • Detection: compare row counts between old and new schemas using a scheduled checksum query. Lag of more than 5 minutes under expected load indicates the sync job is not keeping up.
  • Mitigation: rate-limit foreground writes to stay within the sync job’s throughput capacity, or pause foreground writes briefly during the final cutover window.

Common Mistakes

  • Choosing the partition key based on table primary key alone. The primary key guarantees row uniqueness, not query alignment. The partition key must match the most frequent WHERE predicate, which is often tenant_id, created_at, or region β€” rarely the surrogate ID.
  • Over-partitioning during initial deployment. Creating 128 partitions for a 50-million-row table adds planner overhead proportional to partition count without meaningful I/O savings. Start with 8–16 partitions and split under observed skew.
  • Neglecting cross-partition transaction handling. Distributed transactions that span two or more partitions require saga design with idempotency keys. Wrapping them in a single BEGIN ... COMMIT block silently bypasses local ACID guarantees in distributed setups, allowing partial commits on node failure.
  • Skipping partition pre-creation for range strategies. A monthly range partition table with no pre-creation job will start rejecting writes at midnight on the first of each month until the DBA manually creates the new partition. Automate this with pg_partman or a scheduled function.

FAQ

How do I choose between hash and range partitioning for high-throughput APIs?

Use hash when writes arrive with random entity IDs and no temporal grouping matters β€” uniform distribution eliminates hotspots and each shard grows at the same rate. Use range when the dominant query pattern is time-bounded (weekly reports, rolling 90-day retention) or when you need to archive data by detaching old partitions without a DELETE scan. The trade-off is that range creates a write hotspot on the right edge during high-throughput ingestion; mitigate this by adding a hash sub-partition under the current month’s range boundary.

What metrics indicate a partition strategy is failing in production?

Three signals together are diagnostic: sustained P99 latency spikes on queries that should be pruning to a single partition, IOPS variance greater than 30% above the mean across partitions (visible in pg_stat_user_tables per partition or in cloud database metrics per shard), and a rising count of cross-partition lock-wait events. Any one signal in isolation can have other causes; all three together means the partition key does not align with the access pattern.

Can I change partition keys after deployment without downtime?

Not directly β€” a new partition key requires migrating every row into a new table structure. The safe path is a dual-write migration: route all new writes to both the old schema and the new schema simultaneously, run a background job to backfill historical rows, validate with row-count and checksum comparisons, then cut read traffic to the new schema incrementally using a feature flag. Expect the migration to take one to three weeks for tables above 500 million rows, depending on backfill throughput and the write amplification your infrastructure can absorb.

Articles in This Section