PostgreSQL High Availability on OCI: Why Your Failover Passes Every Test But Breaks in Production

If you have built PostgreSQL high availability clusters on AWS or Azure, you have probably gotten comfortable with how virtual IPs work. You assign a VIP, your failover tool moves it, and your application reconnects to the new primary. Clean. Simple. Done.Then you try the same thing on Oracle Cloud Infrastructure and something quietly goes wrong.The cluster promotes. Patroni (or repmgr, or whatever you are using) does its job. The standby becomes the new primary. But the VIP does not follow. Your application keeps sending traffic to the old node — the one that just failed. From the outside, it looks like the database is down. From the inside, everything looks green.
Read More

pgNow Instant PostgreSQL Performance Diagnostics in Minutes

pgNow is a lightweight PostgreSQL diagnostic tool developed by Redgate that provides quick visibility into database performance without requiring agents or complex setup. It connects directly to a PostgreSQL instance and delivers real-time insights into query workloads, active sessions, index usage, configuration health, and vacuum activity, helping DBAs quickly identify performance bottlenecks. Because it runs as a simple desktop application.
Read More

Thinking of PostgreSQL High Availability as Layers

High availability for PostgreSQL is often treated as a single, big, dramatic decision: “Are we doing HA or not?”That framing pushes teams into two extremes:- a “hero architecture” that costs a lot and still feels tense to operate, or - a minimalistic architecture that everyone hopes will just keep running.A calmer way to design this is to treat HA and DR as layers. You start with a baseline, then add specific capabilities only when your RPO/RTO and budget justify them.Let us walk through the layers from “single primary” to “multi-site DR posture”.Start with outcomesBefore topology, align on three things:1. Failure scope a. A database host fails b. A zone or data center goes away c. A full region outage happens d. Human error2. RPO (Recovery Point Objective) a. We can tolerate up to 15 minutes of data loss b. We want close to zero3. RTO (Recovery Time Objective) a. We can be back in 30 minutes b. We want service back in under 2 minutesHere is my stance (and it saves money!): You get strong availability outcomes by layering in the right order.
Read More

How PostgreSQL Scans Your Data

To understand how PostgreSQL scans data, we first need to understand how PostgreSQL stores it. A table is stored as a collection of 8KB pages (by default) on disk. Each page has a header, an array of item pointers (also called line pointers), and the actual tuple data growing from the bottom up. Each tuple has its own header containing visibility info: xmin, xmax, cmin/cmax, and infomask bits.
Read More