Skip to main content

Proxy Routing Architectures for Partitioned Databases

Proxy routing architectures sit between application servers and partitioned database backends, centralising connection management, shard selection, and failover logic in a dedicated network tier. This page is part of Cross-Partition Querying & Aggregation Strategies and covers when and how to operate a proxy layer, contrasting it with application-level sharding logic where the client owns routing decisions.

Problem Framing

At modest scale a single write primary handles routing implicitly. Once you cross roughly 10 M rows per table or 5 k concurrent connections, the routing problem becomes operational: you need topology-aware connection pooling, zero-downtime shard rebalancing, and a single place to enforce read/write splitting — without redeploying application code every time a shard moves.

The three scenarios where a proxy tier consistently pays for itself:

  1. Dynamic topology changes. Adding or retiring shards without forcing client-side config pushes.
  2. Heterogeneous engine mix. Routing writes to PostgreSQL primaries and reads to read replicas or analytics nodes behind a single connection string.
  3. Centralised observability. Capturing per-shard latency, error rates, and connection saturation metrics without instrumenting every service.

Architecture Overview

The diagram below shows the proxy tier as a stateless layer between app servers and sharded backends. Read replicas receive analytical traffic via a separate backend pool; the admin API allows hot-loading of routing maps without restarting the proxy.

Proxy Routing Architecture Application servers connect to a stateless proxy tier. The proxy routes write traffic to sharded primary nodes and read traffic to read replicas. An admin API enables hot-reload of routing maps. App Server A App Server B App Server C Proxy Tier (ProxySQL / PgBouncer / HAProxy + Lua) connection pool · routing map Admin API (hot-reload) Shard 1 Primary Shard 2 Primary Shard 3 Primary Read Replica Pool WRITE READ write path read / admin path

Proxy Tool Comparison

Not all proxies offer the same capabilities. Choosing the wrong one for your query patterns is the most common architectural mistake.

Proxy Protocol awareness SQL-aware routing Connection pooling Hot config reload
HAProxy TCP only No (needs Lua) No Yes (via socket)
PgBouncer PostgreSQL No Yes Partial (SIGHUP)
ProxySQL MySQL Yes (query rules) Yes Yes (admin port)
Pgpool-II PostgreSQL Yes (read/write split) Yes Yes
Odyssey PostgreSQL Limited Yes Yes

Implementation Walkthrough

Step 1 — Configure the Connection Pool

Stateless proxy tiers must be tuned to backend capacity first. Under-provisioned pools cause connection storms; over-provisioned ones exhaust backend file descriptors. The following PgBouncer configuration targets a 3-shard cluster with 25 app-side connections per shard:

[databases]
shard1 = host=10.0.1.10 port=5432 dbname=app_db
shard2 = host=10.0.1.11 port=5432 dbname=app_db
shard3 = host=10.0.1.12 port=5432 dbname=app_db

[pgbouncer]
pool_mode = transaction
max_client_conn = 500
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600
server_lifetime = 3600
tcp_keepalive = 1
tcp_keepidle = 60
tcp_keepintvl = 10
log_connections = 0
log_disconnections = 0

The reserve_pool_size = 5 entries absorb burst traffic without opening net-new backend connections under load.

Step 2 — Define Read/Write Routing Rules (ProxySQL)

For MySQL-backed sharded clusters, ProxySQL’s query rules engine routes at the SQL level without requiring application changes. Insert rules via the admin port:

-- Route writes to hostgroup 0 (primaries)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (1, 1, '^(INSERT|UPDATE|DELETE|REPLACE)', 0, 1);

-- Route reads to hostgroup 1 (replicas)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (2, 1, '^SELECT', 1, 1);

-- Reload the rules without restarting
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

ProxySQL’s match_pattern field accepts POSIX regular expressions, so tenant-scoped routing (WHERE tenant_id = 'acme') can route to a dedicated hostgroup for that tenant’s primary.

Step 3 — Inject a Shard Routing Map via the Admin API

Shard-aware middleware parses incoming SQL payloads to extract partition keys before forwarding. When a new shard goes live, push an updated routing map without restarting the proxy:

curl -X POST http://proxy-admin:8080/api/v1/routing-map \
  -H 'Content-Type: application/json' \
  -d '{
    "partition_key_range": "1000-2000",
    "target_backend": "shard_3",
    "weight": 100,
    "ttl": 3600
  }'

Hot-reload sequence to avoid connection drops:

  1. Validate the new partition key range against the existing distribution — overlap causes duplicate-write bugs.
  2. Push the updated routing map via the administrative API.
  3. Wait for health checks on the new target node to reach UP status (typically 3 consecutive passing checks).
  4. Drain legacy connections gracefully: set old route weight to 0, wait for active connection count to reach zero, then remove the entry.

Step 4 — Configure HAProxy for TCP-Level Shard Routing

When SQL parsing is not needed — for example, when the application pre-selects a port per shard — HAProxy in TCP mode provides low-overhead connection routing with robust health checking:

global
    maxconn 50000
    log /dev/log local0

defaults
    mode tcp
    timeout connect 5s
    timeout client 60s
    timeout server 60s
    option tcplog

frontend db_proxy
    bind *:5432
    mode tcp
    use_backend db_shard_1 if { dst_port 5432 }
    use_backend db_shard_2 if { dst_port 5433 }
    default_backend db_shard_1

backend db_shard_1
    mode tcp
    balance leastconn
    option tcp-check
    default-server inter 5s fall 3 rise 2 on-marked-down shutdown-sessions
    server node1a 10.0.1.10:5432 check
    server node1b 10.0.1.13:5432 check backup

backend db_shard_2
    mode tcp
    balance leastconn
    option tcp-check
    default-server inter 5s fall 3 rise 2 on-marked-down shutdown-sessions
    server node2a 10.0.1.11:5432 check
    server node2b 10.0.1.14:5432 check backup

The on-marked-down shutdown-sessions directive closes existing connections to a failed backend immediately, preventing the proxy from silently routing to a dead node.

Step 5 — Instrument the ORM Connection String

Application servers point at the proxy tier, not individual shards. Configure SQLAlchemy (or any ORM) to treat the proxy as the single database endpoint:

# SQLAlchemy connection pool targeting PgBouncer in transaction mode
DATABASE_URL: "postgresql://app_user:[email protected]:6432/app_db"
POOL_SIZE: 5
MAX_OVERFLOW: 2
POOL_TIMEOUT: 30
POOL_RECYCLE: 1800
POOL_PRE_PING: true

POOL_SIZE must be well below default_pool_size in PgBouncer. If every application process opens 25 connections and you have 20 processes, PgBouncer’s max_client_conn needs headroom above 500 or connections queue.

Configuration Reference

Parameter Recommended value Rationale
pool_mode (PgBouncer) transaction Multiplexes connections per statement; safe for stateless apps
default_pool_size 25 per shard Matches typical PostgreSQL max_connections / num_shards target
reserve_pool_size 5 Absorbs burst without exhausting backend file descriptors
server_lifetime 3600 s Prevents idle connections from accumulating on the backend
tcp_keepidle 60 s Detects dead connections before they cause silent query hangs
tcp_keepintvl 10 s Rapid re-probe once idle threshold is hit
HAProxy inter 5 s Health check interval — balance sensitivity vs. check overhead
HAProxy fall 3 Requires 3 consecutive failures before marking a backend down
HAProxy rise 2 Requires 2 consecutive passes before restoring a recovered node

Operational Contrast

Compared to application-level sharding logic, where the client owns the routing decision and holds direct connections to each shard, a proxy tier trades a small latency cost (0.5–2 ms) for:

  • Centralised pool governance — one place to tune max_connections rather than every service instance.
  • Zero-downtime topology changes — routing map updates propagate from the proxy outward, not from a config push to dozens of application pods.
  • Cross-engine compatibility — PgBouncer can sit in front of Postgres; ProxySQL in front of MySQL; HAProxy in front of either, using the same operational runbook.

Federated query execution handles the harder case where a single query must aggregate results from multiple shards. The proxy does not merge results — it routes to the shard that owns the data, or to a dedicated aggregation node for cross-partition queries. Treat cross-shard aggregation patterns as the upstream concern; proxy routing as the connection management layer beneath it.

Failure Modes

1. Connection Storm on Proxy Restart

Root cause: All application connections attempt reconnection simultaneously after a proxy restart or failover, exhausting the backend’s max_connections within seconds.

Detection:

rate(pgbouncer_client_wait_seconds_sum[1m]) / rate(pgbouncer_client_wait_seconds_count[1m]) > 2

Mitigation: Enable server_reset_query_always = 1 in PgBouncer and stagger restart timing using haproxy -sf $(pidof haproxy) for in-place reload. Add exponential backoff with jitter in the ORM reconnect logic.

2. Stale Routing Map After Shard Rebalancing

Root cause: The proxy’s in-memory routing table still references the old shard for a key range that has been migrated. Writes land on the previous owner and are silently accepted, causing a dual-write state.

Detection:

increase(proxy_routing_map_staleness_total[5m]) > 0

Mitigation: Enforce a two-phase routing update: first mark the old route as read-only (writes rejected), migrate the data, then promote the new route. Never delete the old route entry before migration completes.

3. Read Replica Lag Causing Dirty Reads

Root cause: The proxy routes a read immediately after a write to the replica pool. The replica has not yet applied the write, returning stale data. This is especially pronounced with PgBouncer in session mode when synchronous_commit = off.

Detection:

pg_replication_lag_seconds{role="replica"} > 5

Mitigation: For session-critical reads, use SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE to force routing to the primary, or implement sticky routing that pins a session to the primary for a configurable window after any write.

4. Health Check False Positives Cycling Backends

Root cause: A busy backend briefly exceeds the TCP connection timeout during a traffic spike, triggering the fall threshold. The proxy marks it down, shifts load to remaining backends, which then also spike — cascading failure.

Detection:

increase(haproxy_backend_status{state="DOWN"}[5m]) > 2

Mitigation: Increase inter from 2 s to 5 s and fall from 2 to 3. Add slowstart 30s to HAProxy backends so recovering nodes receive a linearly increasing share of traffic rather than an immediate full load.

Common Mistakes

  • Routing by full table scan instead of partition key. Proxies that broadcast a query to all shards when the partition key is absent in the WHERE clause create O(n) load. Enforce partition key extraction before dispatch; reject or alert on table-scan-class queries.
  • No POOL_PRE_PING on the ORM side. Without a pre-ping, the ORM reuses stale connections after a proxy restart, leading to cryptic SSL SYSCALL error: EOF detected errors. Enable POOL_PRE_PING = true in SQLAlchemy or equivalent in your ORM.
  • Running PgBouncer in session mode for OLTP workloads. Session mode holds a backend connection for the duration of a client session, negating connection multiplexing. Use transaction mode for stateless microservices.
  • Missing backup server entries in HAProxy. Without check backup on standby nodes, a failed primary takes down the entire backend pool rather than failing over to the replica.

FAQ

When should I use a proxy instead of application-level routing?

Use a proxy when you need centralised connection pooling, dynamic topology updates without client redeployment, or unified observability across heterogeneous database engines. Application-level routing is preferable when you need sub-millisecond routing decisions or when the application already owns the partition key extraction logic and direct shard access simplifies the operational surface.

How do proxies handle queries that span multiple shards?

Most proxies cannot merge results from multiple backends. HAProxy and PgBouncer route a connection to exactly one backend per session. ProxySQL has limited fan-out via its multiplex option, but the merge step still requires an upstream coordinator. For true multi-shard aggregation, route the query to a federated query engine and treat the proxy as the connection pool in front of it.

What is the realistic latency overhead of the proxy tier?

Expect 0.5–2 ms per round trip, covering TLS termination, connection pool lookup, and routing table evaluation. This is typically offset by reduced backend connection overhead — a single proxy with 25 backend connections can front hundreds of client connections, lowering the total PostgreSQL max_connections pressure significantly.

Articles in This Section