Skip to main content

Building a Custom Query Router with HAProxy

This guide walks through configuring HAProxy as a production-ready TCP-level query router in front of a partitioned database cluster, part of the proxy routing architectures cluster within Cross-Partition Querying & Aggregation Strategies.

Prerequisites

Step 1 β€” Design the Routing Topology

Before writing a single line of HAProxy config, establish a deterministic mapping between application query patterns and HAProxy frontend/backend structures. Routing decisions must align with explicit partition boundaries β€” typically tenant_id, geographic region, or date ranges β€” to avoid cross-shard latency and data consistency violations.

The diagram below shows the traffic flow this configuration implements: a single HAProxy frontend receives all database connections and fans them out to per-shard backend pools. A dedicated aggregator backend absorbs any query that cannot be resolved to a single shard.

HAProxy Query Routing Topology Application servers connect on port 5432 to an HAProxy frontend. ACLs on the frontend route tenant A traffic to Shard A backend, tenant B traffic to Shard B backend, and unmatched connections to a cross-shard aggregator node. Each backend pool contains a primary and a standby. App Servers :5432 HAProxy frontend db_router ACL: src 10.1.0.0/24 ACL: src 10.2.0.0/24 default_backend tenant_a tenant_b default backend shard_a db_a1 10.0.1.10 (primary) db_a2 10.0.1.11 (backup) mysql-check Β· fall 3 rise 2 backend shard_b db_b1 10.0.2.10 (primary) db_b2 10.0.2.11 (backup) mysql-check Β· fall 3 rise 2 backend aggregator agg1 10.0.3.10 cross-shard queries tenant-routed path default path

Align HAProxy ACL priorities with your application-level sharding logic so that multi-shard operations are handled predictably. Each backend pool should be dedicated to one role: write primary, read replica, or cross-shard aggregation.

SRE tip: Document the partition boundary contract in a shared runbook before deploying the proxy. Routing ACLs that silently fall through to the default backend are the most common source of unexpected cross-shard load in production.

Step 2 β€” Configure the Global and Defaults Sections

Start with a minimal haproxy.cfg that establishes safe global limits and TCP-mode defaults:

global
    maxconn 50000
    log /dev/log local0 info
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s

defaults
    mode tcp
    timeout connect 5s
    timeout client  60s
    timeout server  60s
    option tcplog
    log global

Operational note: Set timeout client and timeout server to match your longest expected query duration plus round-trip time. Long-running cross-shard aggregations may require 30–60 s; interactive OLTP queries should use a tighter 10–15 s to surface hung queries early.

DBA tip: maxconn 50000 at the global level is a ceiling, not a target. Set backend-level maxconn values that leave at least 10% of the database’s max_connections reserved for superuser sessions and replication.

Step 3 β€” Define Frontend ACLs for Tenant Routing

HAProxy in TCP mode cannot parse SQL statement content natively. Routing decisions must be based on network-level signals: source IP ranges, destination ports, or TLS SNI values. The following frontend routes tenant A and tenant B traffic to dedicated shard backends using source IP ACLs:

frontend db_router
    bind *:5432
    mode tcp
    option tcplog

    # Tenant-to-shard mapping via application subnet ranges
    acl tenant_a_network src 10.1.0.0/24
    acl tenant_b_network src 10.2.0.0/24

    use_backend shard_a if tenant_a_network
    use_backend shard_b if tenant_b_network
    default_backend aggregator_node

Operational note: Order use_backend directives from most specific to least specific. HAProxy evaluates ACLs sequentially; a broad rule placed above a narrow one will shadow it.

For SQL-content-based routing on fresh TCP connections β€” distinguishing SELECT from INSERT on the wire β€” use tcp-request inspect-delay with req.payload inspection. This approach works only on the first query of each new connection and has measurable CPU overhead from packet buffering:

frontend db_router_sql
    bind *:3306
    mode tcp
    option tcplog
    tcp-request inspect-delay 5s
    tcp-request content accept if { req.payload(0,10) -m found }

    # Anchored regex on first 50 bytes to identify write vs. read intent
    acl is_write req.payload(0,50) -m reg -i ^(INSERT|UPDATE|DELETE|BEGIN)
    acl is_read  req.payload(0,50) -m reg -i ^SELECT

    use_backend shard_primary if is_write
    use_backend shard_replica  if is_read
    default_backend shard_primary

SRE tip: Use -m beg (prefix match) or -m str (exact string) instead of -m reg wherever possible. Regex matching buffers packets and scales poorly under high connection rates. Reserve payload inspection for environments where TCP port-based routing is not feasible.

Critical limitation: Payload inspection only captures the first query on a new TCP connection. Connection poolers reuse connections across many queries, so HAProxy will never see subsequent query bytes. For per-query SQL routing, use ProxySQL (MySQL) or a dedicated SQL-aware middleware layer.

Step 4 β€” Configure Backend Pools with Health Checks

Zero-downtime operation depends on accurate node readiness validation and automatic partition failover. HAProxy supports database-native health probes for both MySQL and PostgreSQL:

backend shard_a
    mode tcp
    balance leastconn
    option mysql-check user haproxy_check
    default-server inter 5s fall 3 rise 2 on-marked-down shutdown-sessions
    server db_a1 10.0.1.10:3306 check
    server db_a2 10.0.1.11:3306 check backup

backend shard_b
    mode tcp
    balance leastconn
    option mysql-check user haproxy_check
    default-server inter 5s fall 3 rise 2 on-marked-down shutdown-sessions
    server db_b1 10.0.2.10:3306 check
    server db_b2 10.0.2.11:3306 check backup

backend aggregator_node
    mode tcp
    balance leastconn
    option mysql-check user haproxy_check
    server agg1 10.0.3.10:3306 check inter 10s fall 2 rise 3

Operational note: Replace option mysql-check with option pgsql-check for PostgreSQL backends. The fall 3 rise 2 thresholds require three consecutive failed health checks to mark a server down, and two consecutive passes to restore it β€” tuned to avoid flapping under momentary load spikes.

DBA tip: The backup directive marks db_a2 and db_b2 as standbys. HAProxy routes to them only when all non-backup servers in the same pool are down. Without this directive, a failed primary takes the entire backend offline rather than failing over to the replica. The on-marked-down shutdown-sessions parameter closes existing connections to a failed node immediately, preventing the proxy from silently routing to a dead backend.

Step 5 β€” Add Session Affinity with stick-table

For read-your-writes consistency β€” where an application must read its own immediately preceding write β€” bind client sessions to a specific backend for their duration. HAProxy’s stick-table implements this without external state:

backend shard_primary
    mode tcp
    balance leastconn
    stick-table type ip size 100k expire 30s
    stick on src
    option mysql-check user haproxy_check
    default-server inter 5s fall 3 rise 2 on-marked-down shutdown-sessions
    server db_primary 10.0.1.10:3306 check
    server db_primary_standby 10.0.1.13:3306 check backup

Operational note: expire 30s removes the session-affinity entry after 30 seconds of inactivity. Size the table (size 100k) to hold one entry per concurrent client IP. For multi-tenant deployments where many tenants share an IP range through NAT, substitute type string keyed on a tenant-specific connection attribute extracted via Lua.

SRE tip: stick-table entries survive an in-place HAProxy reload (haproxy -sf $(pidof haproxy)), but are lost on a full process restart. In distributed deployments, configure peers sections to replicate stick-table state across HAProxy instances.

Verification

Confirm the configuration parses cleanly and the backends are healthy before sending production traffic:

# Validate config syntax β€” exits non-zero on any error
haproxy -c -f /etc/haproxy/haproxy.cfg

# Reload in-place, preserving existing connections and stick-table state
haproxy -sf $(pidof haproxy) -f /etc/haproxy/haproxy.cfg

Query the stats socket to verify backend health. Expected output shows each server as UP with check status L4OK or L7OK:

echo "show servers state" | socat stdio /run/haproxy/admin.sock

Expected output (abridged):

1 shard_a db_a1 2 10.0.1.10 3306 0 0 1 1 6 6 - 1 0 L4OK - 0
1 shard_a db_a2 4 10.0.1.11 3306 0 0 1 1 6 6 - 0 1 L4OK - 0

Column 4 (2) is the server state: 2 = UP, 1 = DOWN. Column 17 is the check result: L4OK means the TCP health check passed at layer 4.

To confirm tenant routing is working, connect from each tenant subnet and verify the session lands on the expected backend:

# From tenant A's 10.1.0.x host β€” should route to shard_a
mysql -h haproxy-host -P 5432 -u app -e "SELECT @@hostname;"

# Confirm in HAProxy stats: shard_a's session count increases, shard_b's does not
echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d',' -f1,2,48

Failure Mode Table

Failure mode Root cause SRE mitigation
High CPU on payload inspection Complex or greedy regex on req.payload forces HAProxy to buffer and scan packets at line rate Switch to -m beg or -m str anchored prefix matching on the first 50 bytes; move per-query routing to ProxySQL
Split routing mid-transaction HAProxy routes per-connection by default; connection poolers reuse sockets across many queries, so ACLs only fire on the first connection handshake Use stick-table to bind the client IP to a backend for the transaction window, or enforce transaction affinity at the application layer
Overlapping ACL priorities shadow narrow rules A broad is_read rule placed above a specific tenant ACL routes all reads to the wrong backend Order use_backend directives from most specific to least specific; validate with haproxy -c and test with echo "show acl" | socat stdio /run/haproxy/admin.sock

FAQ

Can HAProxy route based on SQL query content?

Only the first query on a new TCP connection, using tcp-request inspect-delay and req.payload ACLs. Connection-pooled environments will not benefit from per-query routing at the HAProxy level; use ProxySQL or a SQL-aware middleware layer for that use case.

How does HAProxy handle cross-shard joins?

HAProxy cannot execute joins. It routes connections to a designated aggregator node, or the application must decompose the query and merge results client-side. For native cross-shard aggregation, pair HAProxy with a federated query engine sitting behind the aggregator backend; see cross-shard aggregation patterns for coordinator placement options.

Does HAProxy support database connection pooling?

No β€” HAProxy routes raw TCP streams and does not pool database connections. Pair it with PgBouncer (PostgreSQL) or ProxySQL (MySQL) for true connection multiplexing, query caching, and transaction-level pool management. The proxy routing architectures page covers the full comparison of proxy tools and when to layer them.