File Descriptors: The OS Limit That Takes Down PostgreSQL

Most PostgreSQL outages that trace back to file descriptor exhaustion get misread as a database problem. The failure is one layer down: the kernel runs out of file descriptors and PostgreSQL takes the hit. This post covers how that happens under high connection counts, how to read the log sequence when it does, and how to fix it.

What are file descriptors and why PostgreSQL burns through them

In Linux, the kernel represents almost everything as a file descriptor: TCP sockets, open table files, index files, WAL segments, temp files for sorts and joins, log files. Every open() or accept() call increments a counter. The kernel enforces a system-wide ceiling called fs.file-max. When total FD usage across all processes on the machine hits that ceiling, every new open() fails; regardless of which process is asking.

There’s also a second, separate limit called the per-process ceiling (RLIMIT_NOFILE, controlled by ulimit -n), which caps how many FDs a single process can hold. Either limit can produce the “out of file descriptors” log message or a single backend hitting its per-process ulimit. Both need to be checked during the diagnosis.

PostgreSQL is process-based. Each client connection spawns its own OS process. Each backend holds FDs for its client socket, the table and index files it’s accessing (managed through PostgreSQL’s internal VFD system, capped by max_files_per_process, default 1,000), WAL segments, and any temp files. An idle backend holds 10–15 FDs. An active write backend touching multiple tables with indexes can hold 50–200 or more.

The theoretical worst case is max_connections x max_files_per_process. In practice you won’t hit that ceiling, but even a fraction of it is dangerous when thousands of connections are open at once.

How exhaustion happens

The failure typically follows the same pattern. max_connections is set high, sometimes 10,000 or more, and the application opens connections freely without a pooler. Because each connection spawns a dedicated OS process, applications that don’t close connections promptly, or that hold long-running transactions in idle in transaction state, accumulate open backends that eat OS resources without doing any real work.

When a batch job kicks off on top of this baseline, thousands of additional connections open. Most sit idle in ClientRead, but idle connections still consume memory, process table entries, and file descriptors. When those connections start executing write operations simultaneously, backends contend on shared resources: LWLock:BufferContent, LWLock:WALInsert, LWLock:WALWrite. Backends waiting on these locks hold their file descriptors open longer than normal. That extended hold time pushes peak FD usage higher than the connection count alone would suggest, so the system can hit fs.file-max at a lower concurrency level than the static math predicts.

By the time FD exhaustion errors show up in the logs, you’re already past the safe threshold. It looks sudden because the lock contention phase is brief and the ceiling gets crossed fast once it starts.

The error sequence in the logs

PostgreSQL logs the failure in stages. First, this message appears mixed in with normal query logs:

LOG: out of file descriptors: Too many open files in system; release and retry

If the situation continues, backends start aborting:

LOG: server process (PID XXXXX) was terminated by signal 6: Aborted

Once enough backends fail, PostgreSQL enters crash recovery. During recovery, all application nodes attempt to reconnect simultaneously, producing waves of SSL failures:

LOG: failed to send SSL negotiation response: Broken pipe

On the application side:

ResourceException: Unable to create new connection

The database may come back online before the environment fully stabilizes. Reconnection storms from application nodes can continue stressing the system even after PostgreSQL recovers, particularly if batch retry logic triggers another wave of connection attempts.

Diagnosing the problem

At the OS level:

# System-wide maximum allowed
cat /proc/sys/fs/file-max
# Current usage: <allocated> <free(always 0)> <max>
cat /proc/sys/fs/file-nr
# Per-process limit for the PostgreSQL postmaster
cat /proc/$(pgrep -o postgres)/limits | grep "open files"

The first column of file-nr is the number of FDs currently allocated across the entire machine. The third column is the limit. If those numbers are close, the system is at risk.

At the PostgreSQL level:

-- Maximum connections allowed
SHOW max_connections;
-- Maximum file descriptors per backend process
SHOW max_files_per_process;
-- Current active connections by state

SELECT count(*), state, wait_event_type, wait_event
FROM pg_stat_activity
GROUP BY state, wait_event_type, wait_event
ORDER BY count DESC;

If max_connections x max_files_per_process approaches fs.file-max, you’re already in the danger zone before a batch job even starts.

The fix: PgBouncer

Raising fs.file-max buys time. It does not fix the problem. The issue is too many backend processes, not a limit that’s set too low. Raising it delays the next crash and doesn’t prevent it. The right fix is deploying a connection pooler like PgBouncer between the application and PostgreSQL.

PgBouncer accepts thousands of connections from the application side and multiplexes them down to a small pool of actual PostgreSQL backends. In transaction mode, it assigns a backend to a client only for the duration of a transaction. When the transaction ends, that backend goes back to the pool. The application connects and queries normally. PostgreSQL runs 200-500 backend processes instead of thousands.

Once PgBouncer is in place, drop max_connections on PostgreSQL to 500-1,000. That makes it structurally impossible to accumulate thousands of backends regardless of application behavior.

Increasing OS file descriptor limits (temp measure)

Apply this while PgBouncer is being deployed. Keep in mind this is a safety net and a temporary fix, it doesn’t fix the root cause.

Plain Linux VM

# Immediate (lost on reboot)
sysctl -w fs.file-max=20000000

# Permanent
echo “fs.file-max = 20000000” >> /etc/sysctl.conf
sysctl -p

# Verify
cat /proc/sys/fs/file-max
cat /proc/sys/fs/file-nr

Kubernetes (AKS)

In case you are on Azure k8s, direct SSH access to edit sysctl.conf is not available. However, two approaches are available, described below in order of disruption.

Option 1: kubectl debug node (immediate, does not survive node restart)

Use this to apply the change immediately when downtime risk outweighs the temporary nature of the fix. Find which node the database pod is running on:

kubectl get pod <pod-name> -n <namespace> -o wide

Open a privileged debug shell on that node:

kubectl debug node/<node-name> -it --image=ubuntu -- bash

Inside the debug shell, chroot to the host filesystem and apply the change:

chroot /host
sysctl -w fs.file-max=20000000

Verify from inside the database pod:

kubectl exec <pod-name> -n <namespace> -- sh -c \
  'echo "file-max: $(cat /proc/sys/fs/file-max)" && \
   echo "file-nr:  $(cat /proc/sys/fs/file-nr)"'

The setting is lost when the node restarts or is reimaged.

Option 2: Privileged DaemonSet (survives node restarts, no node pool changes required)

A DaemonSet with a privileged init container re-applies the sysctl setting on every node start. Scope it to the database node pool with a nodeSelector:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fd-limit-setter
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: fd-limit-setter
  template:
    metadata:
      labels:
        app: fd-limit-setter
    spec:
      nodeSelector:
        agentpool: dbnodes      # match your database node pool label
      tolerations:
        - operator: Exists
      initContainers:
        - name: set-fd-limit
          image: busybox
          securityContext:
            privileged: true
          command: ["sysctl", "-w", "fs.file-max=20000000"]
      containers:
        - name: pause
          image: gcr.io/google-containers/pause:3.1

After applying, verify from inside the database pod:

kubectl exec <pod-name> -n <namespace> -- sh -c \
  'echo "file-max: $(cat /proc/sys/fs/file-max)" && \
   echo "file-nr:  $(cat /proc/sys/fs/file-nr)"'

The init container runs on every node restart, so the setting is reapplied automatically after reboots and reimages.

Monitoring

The file-nr alert is the one that matters most as by the time FD errors show up in the logs, it’s already too late. Set alerts on these two things before this becomes an incident again:

file-nr first column exceeding 50–60% of fs.file-max
Active PostgreSQL connection count via pg_stat_activity crossing a defined threshold (1,000 is a reasonable baseline for batch-heavy workloads).

During batch execution windows, FD usage can be watched in real time:

watch -n 5 'cat /proc/sys/fs/file-nr'

In summary, file descriptor exhaustion is one of those failures that feels unpredictable until you understand the mechanics, and then it feels completely inevitable. Too many connections, no pooler, a batch job that tips the balance, and PostgreSQL goes down hard. The good news is the fix is straightforward: PgBouncer in front, max_connections dialed down, alerts on file-nr before it gets critical. Raise fs.file-max if you need to buy time tonight. But if PgBouncer isn’t in the plan for this week, it should be.

In many environments, file descriptor exhaustion is a symptom of a broader operational issue: connection management that no longer matches the workload, application behavior that has changed over time, or infrastructure limits that were never revisited as the system grew. These are the kinds of production challenges our PostgreSQL DBA as a Service team works through every day. From connection pooling and performance tuning to capacity planning and operational reviews, we help teams identify risks early, understand the root cause of recurring issues, and keep PostgreSQL environments running reliably as workloads evolve.