I recently completed a training session for one of our customer on best practices for achieving HA clusters with Patroni and Pgpool in PostgreSQL. During the sessions, different points were discussed, but I’d like to highlight a few that should be carefully considered when designing HA clusters.
1. Use HTTPS for Patroni and ETCD Communication
It’s important to secure communication between data nodes and ETCD nodes to prevent man-in-the-middle attacks. Using HTTPS is recommended for this purpose. You can generate certificates and configure them on all nodes to enable secure HTTPS communication.
2. Use ETCD3 in Patroni.yml File
ETCD is a distributed and reliable key-value store that holds critical data for a distributed system, including the current state of the cluster and details about the primary node. In the patroni.yml file, we must specify the location of the ETCD node. Since ETCD version 2 is no longer supported, we recommend using ETCD version 3, which is more stable, production-ready, and improves overall reliability.
3. Use scram-sha-256 Authentication method for pg_hba.conf file
Using the SCRAM-SHA-256 authentication method in the pg_hba.conf file is highly recommended, as it provides stronger security compared to MD5. SCRAM-SHA-256 protects against brute-force and replay attacks by using salted and iterated password hashing, making stored credentials much harder to crack.
4. Run the Latest – 1 Version of PostgreSQL
Running the latest minus one version of PostgreSQL is generally recommended for production environments. This ensures that you benefit from recent performance improvements, security patches, and stability enhancements while avoiding the risks that may come with very new major releases.
5. Should We Install All Software on Data Nodes or Separate Nodes?
There are pros and cons to both approaches, and the choice depends on your priorities.
Installing all tools (Patroni, HAProxy, Keepalived, ETCD, etc.) directly on the data nodes can save the cost of additional resources or instances. However, the downside is that if a node goes down, disaster recovery can take longer because you would need to reinstall and reconfigure everything from scratch.
Running these tools on separate nodes does increase the overall cost since additional instances are required. However, it greatly simplifies recovery. If a data node fails, you only need to reattach it by installing or configuring the required tool, while the supporting services continue running independently.
In short, installing everything on data nodes saves cost, while using separate nodes saves time and simplifies recovery.
6. Install Keepalived and HAProxy on All Nodes
Installing Keepalived and HAProxy on all nodes is highly recommended to ensure high availability and load balancing across the cluster. HAProxy acts as a reliable proxy layer, directing client connections to the correct primary or standby node based on the current cluster state. Keepalived, on the other hand, manages a virtual IP (VIP) that automatically fails over in case the primary node becomes unavailable, ensuring uninterrupted client connectivity. By deploying both on all nodes, you eliminate single points of failure and achieve seamless failover with minimal downtime.
7. Hardware Sizing for ETCD, Keepalived & HAProxy
A good starting point is typically 2 or more CPU cores and at least 4 GB of RAM. Sufficient resources help prevent bottlenecks at the load balancer and failover manager layers.
8. Build HAProxy from Source (Not via apt/yum)
Building HAProxy from source makes sure you’re running the latest stable version, which often includes important bug fixes and new features that might not yet be available in the default package repositories.
9. Hardware Sizing for Data Nodes
PostgreSQL is lightweight enough that it can run even on very small systems. For example, a 1 vCPU / 1 GB RAM instance on AWS or even a Raspberry Pi with just 512 MB of RAM.
However, when deploying a PostgreSQL cluster in production, the system requirements depend heavily on the business workload.
To recommend the right specifications, we typically need to understand factors such as:
- Nature of the business and application
- Number of concurrent connections
- Workload type (OLTP, OLAP, mixed, time-series, etc.)
- Query complexity and concurrency
- Caching requirements (e.g., shared_buffers, work_mem)
Once we know these details, we can suggest a setup tailored to your needs.
That said, as a general starting point, many production deployments begin with at least:
- 8 vCPUs
- 32 GB RAM
Resources can then be scaled up as the business and data grow.
10. Recommended Tools to Perform Cluster Benchmarks
Once the HA cluster is built, it is always recommended to simulate load on the cluster before going live in production. Running benchmarks that closely reflect your real-world workload helps validate performance and stability. To perform these benchmarks, you can use the following recommended tools:
- HammerDB
- Sysbench and
- pgbench