StormaticsStormatics

Understanding PostgreSQL WAL and optimizing it with a dedicated disk

If you manage a PostgreSQL database with heavy write activity, one of the most important components to understand is the Write-Ahead Log (WAL). WAL is the foundation of PostgreSQL’s durability and crash recovery as it records every change before it’s applied to the main data files. But because WAL writes are synchronous and frequent, they can also become a serious performance bottleneck when they share the same disk with regular data I/O.

In this guide, we’ll first break down what WAL is and how it works internally, then walk through the exact steps to move pg_wal to a dedicated disk. You’ll also learn how to validate the setup and roll back safely if needed.

What Exactly is the WAL?

PostgreSQL operates on a simple principle:

No modification to a data file can happen until the logical change describing it has been permanently recorded in the WAL.

WAL is a sequential record of all changes made to the database. It ensures that a record of every change exists on stable storage before the actual data pages on disk are updated.

How  WAL works internally

When we run an insert query, it’s first inserted into the appropriate table page within PostgreSQL’s shared memory area (the buffer pool). This memory page is now “dirty” because it differs from the version on disk.

Before this dirty page can be flushed back to its permanent location on disk, the database engine creates a WAL record. This isn’t the SQL command itself, but a low-level, binary representation of the exact change made. Each WAL record contains:

  • Transaction ID – Which transaction made the change
  • Page information – Which database page was modified
  • Redo information – How to reconstruct the change
  • Undo information – How to reverse the change (for rollbacks)
  • CRC checksum – For detecting corruption

This WAL record is first written to a small, fast area in memory called the WAL Buffer. When a transaction is committed, PostgreSQL ensures the relevant WAL records are physically written to a file in the pg_wal directory. This is the point of durability; the transaction is now safe. Even a sudden power loss cannot erase a committed transaction. Now that the change is permanently logged, the actual dirty data page in memory can be written back to the main data files on disk at a later, more convenient time. 

PostgreSQL manages WAL  through a process called checkpointing. A checkpoint is a moment where all dirty data pages are flushed to disk, ensuring data files are synchronized with the WAL up to a certain point.

Once all the changes in a WAL file have been applied to the data files, that WAL file is no longer needed for crash recovery. Instead of being deleted, its space is recycled for future WAL records. This efficient management prevents the unlimited growth of pg_wal and is a key aspect of PostgreSQL’s maintenance operations.

It’s worth noting that if you enable WAL Archiving for point-in-time recovery (PITR), the files are copied to a designated archive location before being recycled in pg_wal.

How The WAL Creates Disk I/O Pressure

The strain on the disk subsystem comes from two key characteristics of WAL operations:

1. The Requirement for Synchronous Writes

To guarantee durability (the “D” in ACID), a transaction must wait for its WAL records to be physically written and synchronized to disk before it can return a “commit successful” message to the client. 

A fsync() operation forces the operating system to flush its own buffers and commit the data to the physical platters (or memory cells, in the case of SSDs). This is a relatively slow process because it has to wait for the disk’s confirmation.

While WAL writes are efficient sequential writes, each synchronous commit introduces a blocking operation. High-concurrency workloads can generate a queue of transactions all waiting for their turn to have their WAL records synced to disk. This directly limits transactions per second (TPS).

2. Mixed I/O Patterns Contention

A database server disk typically handles two very different types of traffic:

  1. WAL Writes: Sequential writes (appending to a log).
  2. Data File I/O: A mix of random reads (reading specific table/index pages) and random writes (flushing dirty pages from the buffer pool).

When both the WAL and the data files reside on the same disk, these workloads compete for the same limited I/O resources. The disk head must constantly jump between the sequential WAL write area and the random access data file area. This thrashing dramatically reduces overall throughput for both operations.

The critical, synchronous WAL writes get delayed by other I/O operations, increasing commit latency and hurting application response times. In essence, the disk becomes a busy intersection where high-priority synchronous WAL writes are stuck in traffic behind everyday traffic (data file I/O).

The Solution: A Dedicated WAL Disk

The strategy to resolve this load separation is to move the WAL activity onto its own, physically separate storage device. This is achieved by setting the wal_directory to a new location on a different disk.

How This Resolves the Problem

Eliminates I/O Contention: The dedicated WAL disk now handles only the sequential write and sync operations. The main data disk is freed to handle only the random read/write operations of the data files. There is no more head-thrashing or competition for I/O bandwidth between these two distinct patterns.

Isolates Latency: The performance of synchronous commits is now solely determined by the speed and latency of the WAL disk. Slow random reads on the data disk no longer impact the crucial commit process. This makes performance more predictable and stable.

Allows for Targeted Hardware Optimization: This separation allows to choose hardware specifically suited to each workload:

Step-by-Step Implementation

Note: This activity involves modifying critical database components. Always perform a full backup and test the entire process in a non-production environment first.

Ensure you have:

  • A separate disk or partition attached to the server and mounted  (e.g., /mnt/pgwal ).
  • Superuser (root) access on the server.

1. Identify Paths and Confirm Disk Setup

First, connect to your database using psql and run:

SHOW data_directory;

This will return the PostgreSQL data directory path. Inside that directory, you should see the pg_wal subdirectory (where WAL files are stored by default). Example output:

/var/lib/postgresql/data

In this case, the WAL files live under:

/var/lib/postgresql/data/pg_wal

Note down this path. Now check if the new WAL disk is available and mounted on your system. Run: 

lsblk -f  OR     df -h | grep wal

Look for a mount path like /mnt/pgwal, /mnt/wal_disk, or similar. Example:

/dev/nvme1n1   ext4   /mnt/pgwal

This confirms the new disk is mounted at /mnt/pgwal. To sanity check the mount path, run:

ls -lh /mnt/pgwal

Ensure the directory is accessible and writable. At the end of this step, you should have :

  • Current data directory path (with existing pg_wal).
  • Confirmed mount point for the new WAL disk.

2. Stop the PostgreSQL Service

A clean shutdown is mandatory to ensure data integrity.

Option 1: systemd-managed service

sudo systemctl stop postgresql

Option 2: manual stop

pg_ctl -D "$PGDATA" stop -m fast

Verify the service has fully stopped.

sudo systemctl status postgresql

3. Copy WAL Files to the New Location


To copy WAL files to another mount while preserving permissions, ownership, and timestamps, you can use cp -a (archive mode). Here’s the complete steps:

Set WAL mount path (adjust as needed):

export WAL_MOUNT="/mnt/pgwal"

Create destination directory if it doesn’t exist:

mkdir -p "$WAL_MOUNT/pg_wal"

Copy existing WAL files while preserving permissions, ownership, and timestamps:

cp -a "$PGDATA/pg_wal/." "$WAL_MOUNT/pg_wal/"

4. Create a Symbolic Link

To replace the original WAL directory with a symlink pointing to the new location, backup the original WAL directory for rollback:

mv "$PGDATA/pg_wal" "$PGDATA/pg_wal_backup"

Next, create a symlink:

ln -s "$WAL_MOUNT/pg_wal" "$PGDATA/pg_wal"

5. Fix Permissions and Ownership

Ensure PostgreSQL has proper access to the new WAL directory.

chown -R postgres:postgres "$WAL_MOUNT/pg_wal"
chmod 700 "$WAL_MOUNT/pg_wal"

6. Validation Steps

To confirm your configuration is working properly, check that pg_wal is now a symlink:

ls -ld "$PGDATA/pg_wal"

Expected output (example): /var/lib/postgresql/17/main/pg_wal -> /mnt/pgwal/pg_wal. -> should point to your new mount. Now confirm both locations are really on different filesystems:

Check the filesystem of the WAL mount

df -h "$WAL_MOUNT"

Check the filesystem that PGDATA sees for pg_wal

df -h "$PGDATA/pg_wal"

If the migration worked, both df commands will show the same filesystem (the new disk), and this filesystem will be different from the one reported by df -h “$PGDATA”. That confirms your WAL directory is truly offloaded to the new filesystem.

7. Restart PostgreSQL

Bring your database back online:

pg_ctl -D "$PGDATA" start

OR

sudo systemctl start postgresql

To check that WAL is functioning, get the current write position:

psql -U postgres -c "SELECT pg_current_wal_lsn();"

8. Cleanup 

At this point, PostgreSQL WAL files are running off the new disk. After verifying everything works correctly, wait at least one full backup cycle before removing the backup to ensure the new setup is stable. After you are 100% confident, remove the old WAL files:

rm -rf "$PGDATA/pg_wal_backup"

Rollback Procedure

If you encounter issues after moving pg_wal to the new disk (e.g., PostgreSQL won’t start, symlink misconfiguration, or unexpected errors), you can revert to the original setup:

  1. Stop PostgreSQL
  2. Remove the symlink: rm “$PGDATA/pg_wal”
  3. Restore original: mv “$PGDATA/pg_wal_backup” “$PGDATA/pg_wal”
  4. Start PostgreSQL

Conclusion

Moving pg_wal to a separate disk can significantly improve database performance for write-intensive workloads. To ensure a smooth migration and sustained benefits:

  • Benchmark before and after to quantify actual improvements.
  • Monitor I/O stats on both the data and WAL disks to catch imbalances early.
  • Update backup scripts so they are aware of the new WAL location.
  • Track WAL disk usage carefully to avoid out-of-space issues.

Always test this procedure in a non-production environment first, and confirm you have verified backups before making any structural changes to the database setup.

Leave A Comment