Building a Custom Query Router with HAProxy
This guide walks through configuring HAProxy as a production-ready TCP-level query router in front of a partitioned database cluster, part of the proxy routing architectures cluster within Cross-Partition Querying & Aggregation Strategies.
Prerequisites
Step 1 β Design the Routing Topology
Before writing a single line of HAProxy config, establish a deterministic mapping between application query patterns and HAProxy frontend/backend structures. Routing decisions must align with explicit partition boundaries β typically tenant_id, geographic region, or date ranges β to avoid cross-shard latency and data consistency violations.
The diagram below shows the traffic flow this configuration implements: a single HAProxy frontend receives all database connections and fans them out to per-shard backend pools. A dedicated aggregator backend absorbs any query that cannot be resolved to a single shard.
Align HAProxy ACL priorities with your application-level sharding logic so that multi-shard operations are handled predictably. Each backend pool should be dedicated to one role: write primary, read replica, or cross-shard aggregation.
SRE tip: Document the partition boundary contract in a shared runbook before deploying the proxy. Routing ACLs that silently fall through to the default backend are the most common source of unexpected cross-shard load in production.
Step 2 β Configure the Global and Defaults Sections
Start with a minimal haproxy.cfg that establishes safe global limits and TCP-mode defaults:
global
maxconn 50000
log /dev/log local0 info
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
defaults
mode tcp
timeout connect 5s
timeout client 60s
timeout server 60s
option tcplog
log global
Operational note: Set timeout client and timeout server to match your longest expected query duration plus round-trip time. Long-running cross-shard aggregations may require 30β60 s; interactive OLTP queries should use a tighter 10β15 s to surface hung queries early.
DBA tip:
maxconn 50000at the global level is a ceiling, not a target. Set backend-levelmaxconnvalues that leave at least 10% of the databaseβsmax_connectionsreserved for superuser sessions and replication.
Step 3 β Define Frontend ACLs for Tenant Routing
HAProxy in TCP mode cannot parse SQL statement content natively. Routing decisions must be based on network-level signals: source IP ranges, destination ports, or TLS SNI values. The following frontend routes tenant A and tenant B traffic to dedicated shard backends using source IP ACLs:
frontend db_router
bind *:5432
mode tcp
option tcplog
# Tenant-to-shard mapping via application subnet ranges
acl tenant_a_network src 10.1.0.0/24
acl tenant_b_network src 10.2.0.0/24
use_backend shard_a if tenant_a_network
use_backend shard_b if tenant_b_network
default_backend aggregator_node
Operational note: Order use_backend directives from most specific to least specific. HAProxy evaluates ACLs sequentially; a broad rule placed above a narrow one will shadow it.
For SQL-content-based routing on fresh TCP connections β distinguishing SELECT from INSERT on the wire β use tcp-request inspect-delay with req.payload inspection. This approach works only on the first query of each new connection and has measurable CPU overhead from packet buffering:
frontend db_router_sql
bind *:3306
mode tcp
option tcplog
tcp-request inspect-delay 5s
tcp-request content accept if { req.payload(0,10) -m found }
# Anchored regex on first 50 bytes to identify write vs. read intent
acl is_write req.payload(0,50) -m reg -i ^(INSERT|UPDATE|DELETE|BEGIN)
acl is_read req.payload(0,50) -m reg -i ^SELECT
use_backend shard_primary if is_write
use_backend shard_replica if is_read
default_backend shard_primary
SRE tip: Use
-m beg(prefix match) or-m str(exact string) instead of-m regwherever possible. Regex matching buffers packets and scales poorly under high connection rates. Reserve payload inspection for environments where TCP port-based routing is not feasible.
Critical limitation: Payload inspection only captures the first query on a new TCP connection. Connection poolers reuse connections across many queries, so HAProxy will never see subsequent query bytes. For per-query SQL routing, use ProxySQL (MySQL) or a dedicated SQL-aware middleware layer.
Step 4 β Configure Backend Pools with Health Checks
Zero-downtime operation depends on accurate node readiness validation and automatic partition failover. HAProxy supports database-native health probes for both MySQL and PostgreSQL:
backend shard_a
mode tcp
balance leastconn
option mysql-check user haproxy_check
default-server inter 5s fall 3 rise 2 on-marked-down shutdown-sessions
server db_a1 10.0.1.10:3306 check
server db_a2 10.0.1.11:3306 check backup
backend shard_b
mode tcp
balance leastconn
option mysql-check user haproxy_check
default-server inter 5s fall 3 rise 2 on-marked-down shutdown-sessions
server db_b1 10.0.2.10:3306 check
server db_b2 10.0.2.11:3306 check backup
backend aggregator_node
mode tcp
balance leastconn
option mysql-check user haproxy_check
server agg1 10.0.3.10:3306 check inter 10s fall 2 rise 3
Operational note: Replace option mysql-check with option pgsql-check for PostgreSQL backends. The fall 3 rise 2 thresholds require three consecutive failed health checks to mark a server down, and two consecutive passes to restore it β tuned to avoid flapping under momentary load spikes.
DBA tip: The
backupdirective marksdb_a2anddb_b2as standbys. HAProxy routes to them only when all non-backup servers in the same pool are down. Without this directive, a failed primary takes the entire backend offline rather than failing over to the replica. Theon-marked-down shutdown-sessionsparameter closes existing connections to a failed node immediately, preventing the proxy from silently routing to a dead backend.
Step 5 β Add Session Affinity with stick-table
For read-your-writes consistency β where an application must read its own immediately preceding write β bind client sessions to a specific backend for their duration. HAProxyβs stick-table implements this without external state:
backend shard_primary
mode tcp
balance leastconn
stick-table type ip size 100k expire 30s
stick on src
option mysql-check user haproxy_check
default-server inter 5s fall 3 rise 2 on-marked-down shutdown-sessions
server db_primary 10.0.1.10:3306 check
server db_primary_standby 10.0.1.13:3306 check backup
Operational note: expire 30s removes the session-affinity entry after 30 seconds of inactivity. Size the table (size 100k) to hold one entry per concurrent client IP. For multi-tenant deployments where many tenants share an IP range through NAT, substitute type string keyed on a tenant-specific connection attribute extracted via Lua.
SRE tip:
stick-tableentries survive an in-place HAProxy reload (haproxy -sf $(pidof haproxy)), but are lost on a full process restart. In distributed deployments, configurepeerssections to replicate stick-table state across HAProxy instances.
Verification
Confirm the configuration parses cleanly and the backends are healthy before sending production traffic:
# Validate config syntax β exits non-zero on any error
haproxy -c -f /etc/haproxy/haproxy.cfg
# Reload in-place, preserving existing connections and stick-table state
haproxy -sf $(pidof haproxy) -f /etc/haproxy/haproxy.cfg
Query the stats socket to verify backend health. Expected output shows each server as UP with check status L4OK or L7OK:
echo "show servers state" | socat stdio /run/haproxy/admin.sock
Expected output (abridged):
1 shard_a db_a1 2 10.0.1.10 3306 0 0 1 1 6 6 - 1 0 L4OK - 0
1 shard_a db_a2 4 10.0.1.11 3306 0 0 1 1 6 6 - 0 1 L4OK - 0
Column 4 (2) is the server state: 2 = UP, 1 = DOWN. Column 17 is the check result: L4OK means the TCP health check passed at layer 4.
To confirm tenant routing is working, connect from each tenant subnet and verify the session lands on the expected backend:
# From tenant A's 10.1.0.x host β should route to shard_a
mysql -h haproxy-host -P 5432 -u app -e "SELECT @@hostname;"
# Confirm in HAProxy stats: shard_a's session count increases, shard_b's does not
echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d',' -f1,2,48
Failure Mode Table
| Failure mode | Root cause | SRE mitigation |
|---|---|---|
| High CPU on payload inspection | Complex or greedy regex on req.payload forces HAProxy to buffer and scan packets at line rate |
Switch to -m beg or -m str anchored prefix matching on the first 50 bytes; move per-query routing to ProxySQL |
| Split routing mid-transaction | HAProxy routes per-connection by default; connection poolers reuse sockets across many queries, so ACLs only fire on the first connection handshake | Use stick-table to bind the client IP to a backend for the transaction window, or enforce transaction affinity at the application layer |
| Overlapping ACL priorities shadow narrow rules | A broad is_read rule placed above a specific tenant ACL routes all reads to the wrong backend |
Order use_backend directives from most specific to least specific; validate with haproxy -c and test with echo "show acl" | socat stdio /run/haproxy/admin.sock |
FAQ
Can HAProxy route based on SQL query content?
Only the first query on a new TCP connection, using tcp-request inspect-delay and req.payload ACLs. Connection-pooled environments will not benefit from per-query routing at the HAProxy level; use ProxySQL or a SQL-aware middleware layer for that use case.
How does HAProxy handle cross-shard joins?
HAProxy cannot execute joins. It routes connections to a designated aggregator node, or the application must decompose the query and merge results client-side. For native cross-shard aggregation, pair HAProxy with a federated query engine sitting behind the aggregator backend; see cross-shard aggregation patterns for coordinator placement options.
Does HAProxy support database connection pooling?
No β HAProxy routes raw TCP streams and does not pool database connections. Pair it with PgBouncer (PostgreSQL) or ProxySQL (MySQL) for true connection multiplexing, query caching, and transaction-level pool management. The proxy routing architectures page covers the full comparison of proxy tools and when to layer them.
Related
- Proxy Routing Architectures β parent page covering the full proxy tool comparison and operational contrast with application-level routing
- Application-Level Sharding Logic β client-owned routing as an alternative when sub-millisecond routing decisions are required
- Cross-Shard Aggregation Patterns β how to coordinate the aggregator backend that HAProxyβs default route targets