On January 22, 2025, Stormatics hosted an insightful webinar titled “High Availability in Postgres: Your Guide to 5 Nines and Zero Worries”, featuring our Postgres Consultant, Semab Tariq.
The session offered a deep dive into strategies for ensuring Postgres systems remain reliable and resilient, even in the face of unexpected challenges. Packed with actionable insights and real-world examples, the webinar provided attendees with practical tools to optimize their database environments.
The Q&A session featured several pressing questions from attendees, with detailed explanations provided by Semab included in this blog post:
1. How can I handle pg_wal files in a highly OLTP system while having a DR site? Can I configure 1 GB or 4 GB wal segment file size in PG like we do in Oracle?
To manage pg_wal files in a highly OLTP system with a DR site, you can use streaming replication to ensure WAL files are continuously transferred to the DR site. Regularly archive WAL files using archive_mode and archive_command to prevent storage buildup.
For WAL segment size, PostgreSQL supports customizing it at build time by setting
–with-wal-segsize during compilation
PostgreSQL does not allow changing the WAL segment size dynamically or post-installation.
2. In case of a missing wal file due to any reason (removed/corrupted) on the standby database is there any way in Postgres to skip this wal file and resume the standby?
If a WAL file is missing or corrupted on the standby in PostgreSQL, there is no built-in way to skip the WAL file and continue replication because PostgreSQL ensures strict consistency between the primary and standby. However, you can resolve the issue using one of the following approaches:
- Restore the missing WAL file: If you have WAL archiving enabled, retrieve the missing file from the archive and place it in the pg_wal directory of the standby.
- Rebuild the standby: If the WAL file is irrecoverable, you can perform a new base backup from the primary and reinitialize the standby. This ensures consistency without data loss at the standby.
- Skipping a WAL file is not supported as it can lead to data corruption and an inconsistent standby.
3. How does pg_cirrus help prevent split-brain scenarios in the event of network partitioning or failures? Does it include a fencing mechanism?
pg_cirrus is just a wrapper around pgpool that simplifies setting up the cluster with fewer commands. The actual cluster management, including handling split-brain scenarios, is managed by pgpool, which uses its watchdog feature to address such situations.
You can learn more about pgpool and its capabilities here: https://tatsuo-ishii.github.io/pgpool-II/current/intro-whatis.html
4. How can we achieve automatic failover in native replication? Any way to do it except for the trigger file?
To achieve automatic failover in native PostgreSQL replication, you can use tools like Patroni, repmgr, or pg_auto_failover, which are external cluster management solutions that handle failover automatically by monitoring the primary and standby servers.
Without external tools, PostgreSQL itself does not provide built-in automatic failover. However, if you want to avoid using a trigger file for promoting the standby manually, you can script a failover mechanism using a monitoring tool (like pg_isready) combined with a custom script that automatically runs pg_ctl promote on the standby when the primary is deemed unreachable.
These methods ensure seamless failover and are safer than manual intervention or trigger-file-based promotion.
Missed the session? Watch the recording here.