Scenario
An application using pgBouncer (or any connection pool) reports random SSL connection has been closed unexpectedly errors. Failures happen after idle periods, not under load. A connection is borrowed from the pool, goes idle, and when the application attempts the next query it receives SQLSTATE 08006. Restarting the pool temporarily resolves the issue but it recurs.
How to Identify
Conditions:
- Connection errors appear after idle periods (30s–10min), not during high traffic
- pgBouncer logs:
closing because: server idle timeout
pg_stat_activity shows sessions in idle state that disappear without a client-initiated close
- Application pool logs show connections being destroyed and recreated at a regular interval
- On AWS RDS / Aurora: errors cluster near the 350-second boundary
Analysis Steps
-- 1. Check long-idle connections and their age
SELECT pid, usename, application_name, state,
now() - state_change AS idle_for, client_addr
FROM pg_stat_activity
WHERE state IN ('idle', 'idle in transaction')
ORDER BY idle_for DESC;
-- 2. Check server-side timeout settings
SHOW idle_in_transaction_session_timeout;
SHOW tcp_keepalives_idle;
SHOW tcp_keepalives_interval;
SHOW tcp_keepalives_count;
-- 3. Check connection pool health
SELECT count(*) AS total,
sum(CASE WHEN state = 'idle' THEN 1 ELSE 0 END) AS idle_count,
(SELECT setting::int FROM pg_settings WHERE name = 'max_connections') AS max_conn
FROM pg_stat_activity
WHERE backend_type = 'client backend';
-- 4. If using pgBouncer: check pool and config from pgBouncer admin console
-- psql -p 6432 pgbouncer -c "SHOW config"
-- psql -p 6432 pgbouncer -c "SHOW pools"
-- 5. On Linux: verify OS TCP keepalive settings
-- cat /proc/sys/net/ipv4/tcp_keepalive_time (default: 7200 — too high)
-- cat /proc/sys/net/ipv4/tcp_keepalive_intvl
-- cat /proc/sys/net/ipv4/tcp_keepalive_probes
Pitfalls
- Do not simply increase
server_idle_timeout in pgBouncer without also enabling TCP keepalives on the PostgreSQL server — the NAT/firewall will still RST the connection silently.
- Transaction pooling mode in pgBouncer is the most common cause: a client holds a logical connection across multiple real server connections. When the server side is closed by timeout, the client gets 08006 on the next statement.
- Application connection pools (HikariCP, c3p0, SQLAlchemy) have their own idle timeout settings independent of pgBouncer and PostgreSQL. All three layers must be aligned.
idle_in_transaction_session_timeout only triggers for sessions in the idle in transaction state — it does not help for plain idle sessions. Use pgBouncer’s client_idle_timeout for those.
Resolution Approach
Layer 1 — PostgreSQL server: enable TCP keepalives