Scenario
PostgreSQL restarts unexpectedly every few days. The PostgreSQL log shows the server started fresh — no crash-related errors. The DBA checks /var/log/kern.log and finds Out of memory: Kill process 12345 (postgres) score 900 or sacrifice child. The Linux OOM (Out of Memory) killer is terminating PostgreSQL backends when the server runs out of memory. The last OOM’d process was the one executing a large analytical query with high work_mem.
How to Identify
Conditions:
- Unexpected PostgreSQL restarts with no error in
postgresql.log
dmesg | grep oom or journalctl -k | grep "Out of memory" shows OOM events
- Memory pressure:
free -m shows very low available memory
work_mem × max_connections × sort operations exceeds available RAM
- Large
shared_buffers + many connections + high work_mem = OOM risk
Analysis Steps
-- Check memory configuration
SELECT name, setting, unit FROM pg_settings
WHERE name IN (
'shared_buffers', 'work_mem', 'maintenance_work_mem',
'max_connections', 'max_parallel_workers',
'wal_buffers', 'effective_cache_size'
);
-- Estimate worst-case memory usage:
-- shared_buffers + (max_connections × work_mem × sort_operations_per_query)
-- + OS overhead
SELECT
pg_size_pretty(pg_size_bytes(current_setting('shared_buffers'))) AS shared_buffers,
current_setting('max_connections')::int AS max_connections,
pg_size_pretty(pg_size_bytes(current_setting('work_mem'))) AS work_mem_per_sort,
pg_size_pretty(
current_setting('max_connections')::bigint
* pg_size_bytes(current_setting('work_mem'))
) AS worst_case_work_mem;
-- Find current memory-intensive queries:
SELECT pid, usename, state,
extract(epoch from now()-query_start) AS query_seconds,
left(query, 200) AS query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY query_seconds DESC;
-- Check OS memory (run from shell):
-- free -m
-- cat /proc/meminfo | grep -E "MemTotal|MemFree|MemAvailable|Cached"
Pitfalls
work_mem is per sort/hash operation, not per connection. A single complex query can use 10× work_mem. This makes worst-case memory hard to predict.
- Reducing
work_mem globally hurts query performance. The correct approach is low global work_mem with higher per-role settings for analytics users.
- Setting
vm.overcommit_memory = 2 on Linux prevents OOM but means PostgreSQL starts failing allocations earlier. A better balance: vm.overcommit_memory = 1 (default heuristic).
huge_pages = try in PostgreSQL can reduce memory overhead significantly.
- The Linux OOM killer has no knowledge of PostgreSQL’s internal consistency — killing a random backend causes an abrupt crash and full recovery on next start.
Resolution Approach
Calculate the actual memory budget, reduce work_mem globally (set higher per-role for analytics), set vm.overcommit_ratio to control how much memory Linux allows PostgreSQL to commit, and add monitoring to alert before OOM conditions occur.