Symptoms
The error appears verbatim in client logs and pg_log:
FATAL: sorry, too many clients already
Alternatively, when max_connections is set per database or user:
FATAL: too many connections for role "app_user"
FATAL: too many connections for database "production_db"
Observable symptoms
- Application returns 500 errors or generic "database unavailable" messages
- New connections fail instantly — no timeout, no queue
- Existing connected clients continue working normally
- Load balancers may mark backends unhealthy if they test connectivity
- Connection pool logs show exhausted pool or rapid connection churn
- APM dashboards show database error rate spike with zero latency increase (connections never open)
Environment
GUC: max_connections
Set in postgresql.conf or at server start. Requires restart to change. Default is 100. Superusers retain 3 reserved slots via superuser_reserved_connections (default: 3), so the effective limit for regular roles is max_connections - superuser_reserved_connections = 97 by default.
# postgresql.conf
max_connections = 100
superuser_reserved_connections = 3
Per-object limits (optional)
You can restrict connections further per database or per role:
-- Per database
ALTER DATABASE production_db CONNECTION LIMIT 80;
-- Per role
ALTER ROLE app_user CONNECTION LIMIT 50;
These limits compound with the server-level max_connections.
Memory cost
Each connection consumes ~5–10 MB of shared memory overhead in addition to its working memory. Raising max_connections above ~500 without a connection pooler in front is almost always counterproductive — you trade connection rejection for OOM pressure or context-switching overhead.
Shared memory sizing
max_connections directly affects how much shared memory PostgreSQL allocates at startup. On Linux, the required kernel.shmmax and kernel.shmall values must be sufficient.
Root Cause
PostgreSQL maintains a fixed-size connection slot array allocated at startup. When all slots are filled, there is no backpressure mechanism — new connection attempts are immediately refused with FATAL: sorry, too many clients already.
Common causes
- No connection pooler: The application opens one connection per thread/worker/request directly to PostgreSQL. Under load, this exhausts slots fast. The most common cause in web applications.
- Connection leak: Code paths that fail to call
conn.close()(exceptions bypassing cleanup, early returns, abandoned goroutines/threads). Connections accumulate until the limit is hit. - Pool misconfiguration: PgBouncer or application pool
max_pool_size×num_pool_workersexceeds PostgreSQL'smax_connections. - Deployment/rolling restart spike: Old pods hold connections while new pods open theirs. Both sets run simultaneously during the transition window.
- Long-running transactions holding idle connections: A transaction opened and left idle ("idle in transaction") occupies a slot indefinitely.
- Replica or standby misrouting: Read traffic intended for replicas accidentally routed to primary, doubling connection load.
- DDoS or retry storm: After an outage, every client retries simultaneously, overwhelming the connection limit the moment the server recovers.
Investigation
Step 1 — Verify the limit and current usage
Run this as superuser. If you cannot connect, use the reserved superuser slot:
-- Current connection count vs limit
SELECT
count(*) AS total_connections,
(SELECT setting::int FROM pg_settings WHERE name = 'max_connections') AS max_connections,
(SELECT setting::int FROM pg_settings WHERE name = 'superuser_reserved_connections') AS reserved,
(SELECT setting::int FROM pg_settings WHERE name = 'max_connections') -
(SELECT setting::int FROM pg_settings WHERE name = 'superuser_reserved_connections') AS effective_limit
FROM pg_stat_activity;
Step 2 — Break down connections by state
SELECT
state,
wait_event_type,
wait_event,
count(*) AS connections
FROM pg_stat_activity
WHERE pid <> pg_backend_pid()
GROUP BY state, wait_event_type, wait_event
ORDER BY connections DESC;
Key states to watch for:
idle— Connected, doing nothing. Usually indicates a pool that is too large or a connection not being returned.idle in transaction— A transaction is open but the client is not executing any query. This is dangerous: locks are held, and the slot is occupied indefinitely.active— Currently executing a query.
Step 3 — Group by application and user
SELECT
usename,
application_name,
client_addr,
state,
count(*) AS connections
FROM pg_stat_activity
WHERE pid <> pg_backend_pid()
GROUP BY usename, application_name, client_addr, state
ORDER BY connections DESC
LIMIT 30;
Step 4 — Find longest-running idle-in-transaction sessions
SELECT
pid,
usename,
application_name,
state,
now() - xact_start AS txn_duration,
query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY xact_start
LIMIT 20;
Step 5 — Emergency: terminate idle connections to restore service
Terminate idle connections older than 5 minutes to free slots. Do NOT terminate active connections running writes without understanding the workload.
-- Terminate idle connections idle for more than 5 minutes
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
AND now() - state_change > interval '5 minutes'
AND pid <> pg_backend_pid(); Fix Now
Immediate (< 5 min)
- Connect as superuser using the reserved slot:
psql -U postgres - Terminate idle connections older than a few minutes (see Diagnostics Step 5 above)
- Identify and terminate idle-in-transaction sessions if present
- Verify connection count drops below the limit:
SELECT count(*) FROM pg_stat_activity;
Short-term stabilisation (same incident window)
- Raise max_connections temporarily if you have sufficient RAM and can tolerate a restart. Calculate memory budget: each connection needs ~5–10 MB. On a 16 GB instance with 8 GB for
shared_buffers, you have ~800 MB headroom = roughly 80–160 additional connections.ALTER SYSTEM SET max_connections = 200; -- Requires full restart, not just reload pg_ctl restart -D /var/lib/postgresql/data - Enable connection pooling immediately if not already present. PgBouncer in transaction mode can multiplex hundreds of application connections onto tens of PostgreSQL connections.
App-side mitigation (no DB restart required)
- Reduce connection pool
max_pool_sizeper app instance - Reduce number of app replicas temporarily to cut total pool size
- Enable connection timeout + exponential backoff in app config to stop retry storms
Resolution & Prevention
Always use a connection pooler in production
PgBouncer is the standard. Transaction mode is appropriate for most web applications. Configure it so the total PostgreSQL connections from all pooler instances stay below max_connections - superuser_reserved_connections - 5 (leave headroom for DBA access).
[databases]
production_db = host=127.0.0.1 port=5432 dbname=production_db
[pgbouncer]
pool_mode = transaction
max_client_conn = 5000
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600
Set idle_in_transaction_session_timeout
ALTER SYSTEM SET idle_in_transaction_session_timeout = '30s';
SELECT pg_reload_conf();
Monitor pg_stat_activity continuously
Alert when count(*) / max_connections > 0.80. That gives you time to respond before the hard limit is hit.
-- Prometheus / Grafana query for connection usage ratio
SELECT count(*) / (SELECT setting::float FROM pg_settings WHERE name='max_connections')
FROM pg_stat_activity;
Calculate the right max_connections for your hardware
A common formula: max_connections = (RAM_GB * 100) / 10. On a 32 GB instance, that gives 320. In practice, with PgBouncer, 100–200 is often sufficient and healthier for performance.
References
Cloud Variants
RDS PostgreSQL: max_connections is calculated automatically based on instance class: LEAST({DBInstanceClassMemory/9531392}, 5000). On a db.t3.micro (1 GB RAM) this gives only ~22 connections — far too low for most apps. Use RDS Proxy (AWS's managed PgBouncer alternative) or upgrade instance class.
Aurora PostgreSQL: Same formula applies. Aurora Serverless v2 may exhaust connections during scale-up before new capacity is fully available.
Check via RDS console: Enhanced Monitoring → Database connections metric, or CloudWatch metric DatabaseConnections.
Azure Database for PostgreSQL Flexible Server: max_connections is set by SKU tier. Burstable B1ms gives 50 connections. General Purpose D4s_v3 gives 1250. Check with SHOW max_connections;. Use PgBouncer bundled feature (available in Flexible Server) to enable built-in connection pooling without a separate VM.
Metric to watch: Azure Monitor → Active Connections vs Max Connections.
Cloud SQL for PostgreSQL: max_connections is based on machine type memory. Use Cloud SQL Auth Proxy + pg_bouncer sidecar pattern for connection pooling. Direct connections from Cloud Run/Cloud Functions can exhaust connections instantly since each function invocation may open its own connection — always use Cloud SQL connector with a connection pool.
Metric to watch: Cloud Monitoring → database/postgresql/num_backends.
Related & next steps
Concepts on this page
Thanks — noted. This helps keep the database accurate.