Inside a PostgreSQL Checkpointer Bug: A Production Postmortem
One of our client’s PostgreSQL 16.8 production databases started logging what looked like a memory error:
ERROR: invalid memory alloc request size
The error immediately pointed toward two likely suspects:
- Memory exhaustion
- Memory corruption
As it turned out, neither was the culprit. Instead, it had encountered a known PostgreSQL bug that trapped the checkpointer in an infinite retry loop. The only way to recover was a forced restart, followed by an extended period of WAL replay during crash recovery.
This article explains what happened, why manual checkpoints couldn't fix it, and how a PostgreSQL minor version upgrade permanently resolved the issue.
Understanding the purpose of a checkpoint
When a transaction modifies data, PostgreSQL does not immediately write the changed page to disk. Instead, it follows a two-step process:
Write the change to the Write-Ahead Log (WAL) - a sequential, append-only record of every modification.
Keep the modified page in shared memory as a dirty buffer until it is written later.
This design is intentional. WAL writes are sequential and therefore inexpensive, whereas writing data pages directly to their final location requires random disk I/O, which is much more costly. Decoupling these two operations is a fundamental part of PostgreSQL's I/O architecture.
Eventually, however, the dirty buffers in memory must be synchronized with the actual data files on disk. That is the job of a checkpoint.
During a checkpoint, the checkpointer:
Flushes every dirty buffer from shared memory to its corresponding data file.
Calls fsync() on those files to ensure the data has reached durable storage rather than remaining in the operating system's cache.
Records the checkpoint location in the WAL once all writes have been safely persisted.
This checkpoint record is critical for crash recovery. If PostgreSQL crashes, recovery only needs to replay WAL generated after the most recent completed checkpoint, because everything before that point has already been written safely to disk. Without checkpoints, PostgreSQL would have to replay the entire WAL history from the beginning, making recovery increasingly slow as WAL accumulates.
To keep track of which files still require an fsync() before a checkpoint can finish, the checkpointer maintains an internal structure called the fsync request queue. Every data file modified during checkpoint processing is added to this queue. As each file is successfully fsynced, its entry is removed.
Under normal conditions, the queue drains steadily until the checkpoint completes. The problem begins when it doesn't.

