Deploying PostgreSQL Multi-Master Multi-Zone Architecture on AWS

In today’s data-driven world, ensuring high availability and scalability for your PostgreSQL deployments is crucial. Traditional single-master setups can become bottlenecks, especially for geographically dispersed applications.

This is where Multi-Master, architecture comes in. This approach offers a powerful solution for achieving fault tolerance, improved performance, and geographically distributed capabilities for your PostgreSQL databases. This blog will guide you through the process of deploying a Multi-Master, Multi-Zone architecture on AWS, leveraging the capabilities of PGD stands for Postgres Distributed.

If you want to know more about Multi-Master architecture, go ahead and read this article: Understanding Multi-Master Architecture in PostgreSQL: Benefits and Trade-offs.

Architecture detail

Our cluster, we’ll utilize a total of 11 nodes(Machines), distributed as follows:

3 data nodes located in Region A
3 data nodes situated in Region B
2 backup nodes in each region, equipped with Barman
1 node designated for the PEM server
1 node will be used as a witness node to establish global consensus in case of location failure and it will be in Region C
1 node utilized for setting up this required cluster with TPA(Trusted Postgres Architect) we call it TPA node

In addition to PGD, pgd-proxy will also be installed on every data node. The primary function of pgd-proxy is to direct write traffic to the local write leader, enhancing performance.

Pre-requirements

To access the complimentary 30-day trial period for utilizing TPA, PGD, and other tools offered by EnterpriseDB, you’ll need a subscription token. Get your token by signing up HERE.
TPA requires AWS credentials such as SECRET_ACCESS_KEY and ACCESS_KEY_ID to automate the provisioning of nodes effectively. Follow this link to get your AWS credentials HERE.

Setup TPA node

Trusted Postgres Architect (TPA) is an open-source tool by EDB that simplifies deploying robust PostgreSQL clusters on cloud platforms like AWS. It leverages Ansible to automate configuration and provisioning based on best practices, ensuring reliable deployments for both development and production environments. TPA supports both open-source and EDB’s commercial Postgres solutions, offering flexibility for various needs.

Head over to the AWS EC2 console and craft an instance to your preference. We opted for a t2.xlarge instance with RHEL 9 as the operating system, boasting 80 GB of hard drive capacity.

To install TPA follow the steps below

Connect to the TPA node from your terminal via SSH

Switch to the root user

sudo su root

Export security credentials into your system environment

export EDB_SUBSCRIPTION_TOKEN=<TOKEN>
export PATH=/opt/EDB/TPA/bin:$PATH
export AWS_DEFAULT_REGION=<YOUR_REGION> We chose us-east-1
export AWS_SECRET_ACCESS_KEY=<ACCESS_KEY>
export AWS_ACCESS_KEY_ID=<ACCESS_KEY_ID>

Setup the EDB packages repository

curl -1sLf 'https://downloads.enterprisedb.com/<TOKEN>/enterprise/setup.rpm.sh' | sudo -E bash

Install TPA

sudo yum install tpaexec -y

Run the setup command for TPA so that it prepares your environment and downloads the required Python modules

sudo /opt/EDB/TPA/bin/tpaexec setup

After the TPA setup run self-test command to make sure that TPA is installed successfully this command will run a few default tests

 tpaexec selftest

Run the configure command to create the main directory for your cluster

tpaexec configure test_cluster --cluster-bucket edb-pgd --architecture PGD-Always-ON --keyring-backend legacy --platform aws --region us-east-1 --edb-postgres-advanced 16 --redwood --pgd-proxy-routing local --no-git --hostnames-unsorted --owner semab --distribution RedHat --os-version 8 --enable-pem --enable-pgd-probes --instance-type t3.medium --location-names datacenter1 datacenter2 datacenter3 --witness-only-location datacenter3

NOTE: By default, TPA will create 3 data nodes per location/data center. If you want to use any other number of nodes use the following switches –data-nodes-per-location

Understanding the configure command

test_cluster – Upon executing the configure command mentioned above, a new directory named test_cluster will be created in the current working directory. This directory serves as the central hub for all configuration settings related to the cluster.

test_cluster/config.yml – This file acts as a repository for vital information regarding your nodes. It contains details such as which node installs which tool, the public or private IP addresses of your nodes(If you are using the bare metal machines to setup the cluster), their respective regions, group affiliations, and more.

keyring-backend – The default choice is system, but we encountered errors during implementation. Upon investigation, it seems there might be compatibility issues with headless environments. Since this feature is newly introduced in TPA, its stability remains uncertain. The fundamental idea behind a keyring is to securely store sensitive information such as usernames, keys, and passwords on the system’s most secure keyring. However, due to our utilization of a legacy switch, TPA will forego the keyring and opt to store credentials directly on the system instead.

cluster_bucket – Used to store SSH host key files for the cluster inside the AWS S3 bucket

Architecture – This switch is used to tell TPA to create an Always On multi-location structure read more HERE

Redwood – Enable Oracle compatibility within PostgreSQL

Pgd-proxy-routing – By setting this option to local means that we will route all the write queries to a local write leader in each zone. If this option is set to global that will mean we will have only one write leader and that is global all the write operations from each data center will go to the globally elected write leader

Provision the cluster

Execute the provision command to initiate the provisioning of all specified nodes on EC2, as defined in the test_cluster/config.yml file:

tpaexec provision test_cluster

This process may take some time, depending on the complexity of your architecture configurations.

Once the provision command completes its execution, you’ll observe the creation of additional files within the main test_cluster directory. These files consist of public and private keys, SSH configuration files facilitating passwordless SSH access between nodes, as well as certain Ansible configuration files.

As an additional check go to your EC2 console now you will be able to see your instances being created by provision command here we have a total number of 10 nodes being created names for these machines are default

Deploy the cluster

After successful provisioning now is time to deploy the cluster. Deploy command will install all the required tools on the nodes and will make sure that our Multi-Master architecture is ready to use

tpaexec deploy test_cluster

That will now take some time

Verify the cluster

Access any PGD Node within the cluster. You can identify PGD nodes in the test_cluster/config.yml file. Here’s an example entry:

- Name: kaput
  backup: karate
  location: datacenter2
  node: 5
  role:
  - bdr
  - pgd-proxy
  - pem-agent

In my case, the node named karate serves as our data node since it has PGD (formerly named bdr) installed.

Go inside test_clusrter directory and run this command to connect to any node

ssh -F ssh_config kaput

Switch to enterprisedb user

sudo -iu enterprisedb

Cluster health check commands

enterprisedb@kaolin:~ $ pgd show-nodes
Node    Node ID    Group                Type    Current State Target State Status Seq ID 
----    -------    -----                ----    ------------- ------------ ------ ------ 
kaboom  2710197610 datacenter1_subgroup data    ACTIVE        ACTIVE       Up     1      
kaftan  3490219809 datacenter1_subgroup data    ACTIVE        ACTIVE       Up     3      
kaolin  2111777360 datacenter1_subgroup data    ACTIVE        ACTIVE       Up     4      
kaput   2885150576 datacenter2_subgroup data    ACTIVE        ACTIVE       Up     6      
karakul 2388816231 datacenter2_subgroup data    ACTIVE        ACTIVE       Up     2      
karaoke 4049140347 datacenter2_subgroup data    ACTIVE        ACTIVE       Up     7      
karma   2893937153 datacenter3_subgroup witness ACTIVE        ACTIVE       Up     5

enterprisedb@kaolin:~ $ pgd show-groups
Group                Group ID   Type   Parent Group Location    Raft  Routing Raft Leader Write Leader
-----                --------   ----   ------------ --------    ----  ------- ----------- ------------
test_cluster         1409411352 global                          true  false   kaboom
datacenter1_subgroup 1488060427 data   test_cluster datacenter1 true  true    kaolin      kaftan
datacenter2_subgroup 2299288850 data   test_cluster datacenter2 true  true    karaoke     kaput
datacenter3_subgroup 4066894510 data   test_cluster datacenter3 false false

enterprisedb@kaolin:~ $ pgd show-proxies
Proxy   Group                Listen Addrs Listen Port Read Listen Addrs Read Listen Port
-----   -----                ------------ ----------- ----------------- ----------------
kaboom  datacenter1_subgroup [0.0.0.0]    6432        []
kaftan  datacenter1_subgroup [0.0.0.0]    6432        []
kaolin  datacenter1_subgroup [0.0.0.0]    6432        []
kaput   datacenter2_subgroup [0.0.0.0]    6432        []
karakul datacenter2_subgroup [0.0.0.0]    6432        []
karaoke datacenter2_subgroup [0.0.0.0]    6432        []

Testing replication

Since Kaftan is the write leader in Region A connect with kaboom and create a table to test replication

enterprisedb@kaboom:~ $ psql -h kaboom -p 6432 -d bdrdb
psql (16.3.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
bdrdb=# create table testtable (id int);
CREATE TABLE

Switch to kaolin and we can see the table created by kaboom is visible in kaolin as well

enterprisedb@kaolin:~ $ psql -h kaolin -p 6432 -d bdrdb
psql (16.3.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
bdrdb=# d
                       List of relations
 Schema |              Name              | Type  |    Owner     
--------+--------------------------------+-------+--------------
 public | pg_stat_statements             | view  | enterprisedb
 public | pg_stat_statements_info        | view  | enterprisedb
 public | query_advisor_qualstats        | view  | enterprisedb
 public | query_advisor_qualstats_all    | view  | enterprisedb
 public | query_advisor_qualstats_pretty | view  | enterprisedb
 public | testtable                      | table | enterprisedb
(6 rows

Connect with any other node in Region B so that we can make sure the table created in Region A is also getting replicated in Region B so we can connect with karaoke

enterprisedb@karaoke:~ $ psql -h karaoke -p 6432 -d bdrdb
psql (16.3.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
bdrdb=# d
                       List of relations
 Schema |              Name              | Type  |    Owner     
--------+--------------------------------+-------+--------------
 public | pg_stat_statements             | view  | enterprisedb
 public | pg_stat_statements_info        | view  | enterprisedb
 public | query_advisor_qualstats        | view  | enterprisedb
 public | query_advisor_qualstats_all    | view  | enterprisedb
 public | query_advisor_qualstats_pretty | view  | enterprisedb
 public | testtable                      | table | enterprisedb
(6 rows)

Lets now try to insert data inside this table from Region B and see the replication to region A

Connect with karaoke and insert some data

enterprisedb@karaoke:~ $ psql -h karaoke -p 6432 -d bdrdb
psql (16.3.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
bdrdb=# insert into testtable values (1);
INSERT 0 1

Let’s see if data in this table is visible in the same region B so let’s connect with karakul

enterprisedb@karakul:~ $ psql -h karakul -p 6432 -d bdrdb
psql (16.3.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
bdrdb=# select * from testtable ;
 id 
----
  1
(1 row)

Yes data is replicated to the node in the same data center

Now let’s try to connect with a node from region A and see if data is present there as well or not

So let’s connect with kaftan

enterprisedb@kaftan:~ $ psql -h kaftan -p 6432 -d bdrdb 
psql (16.3.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
bdrdb=# select * from testtable ;
 id 
----
  1
(1 row)

Hence our Multi-Master replication is working fine here

Testing Failover

Lets try to bring write leader down in region A which is Kaftan

[ec2-user@kaftan ~]$ sudo systemctl stop postgres

Lets now see group status

enterprisedb@kaboom:~ $ pgd show-groups
Group                Group ID   Type   Parent Group Location    Raft  Routing Raft Leader Write Leader 
-----                --------   ----   ------------ --------    ----  ------- ----------- ------------ 
test_cluster         1409411352 global                          true  false   kaboom                   
datacenter1_subgroup 1488060427 data   test_cluster datacenter1 true  true    kaolin      kaolin       
datacenter2_subgroup 2299288850 data   test_cluster datacenter2 true  true    karaoke     kaput        
datacenter3_subgroup 4066894510 data   test_cluster datacenter3 false false

PGD immediately promoted kaolin as a new write leader Please note that because we performed failover in Region A so next write leader chosen by PGD was also from region A. Similarly if we are to perform the failover from region B then PGD will select a new write leader from region B only. Hence our failover is also working fine

NOTE: After the first failover if another node fails in Region A, PGD won’t be able to promote the third node as the next write leader. Consequently, the third node will hang, and we won’t be able to connect to it. This happens because PGD selects a new write leader through a voting process among the nodes. For the election to occur, a majority of nodes is required. With only one node left, PGD cannot conduct the election, causing the third node to stop functioning.

Backups

Inside each Region, we have a dedicated node for backups. The reason for having 2 backup nodes is to minimize the node failure scenario. If our backup node from Region A fails so we will still have another node in Region B and we won’t loss our backups

Let’s try to connect the backup node from Region A

For backup node i have an entry inside my config.yml like that

- Name: kapok
  location: datacenter1
  node: 4
  role:
  - barman
  volumes:
  - device_name: /dev/sdf
    encrypted: true
    vars:
      volume_for: barman_data
    volume_size: 128
    volume_type: gp2

Now connect with kapok

ssh -F ssh_config kapok

Switch the user to barman

sudo su barman

Barman’s main configuration file is available at /etc/barman.conf. And server file is located at /etc/barman.d/

In my case I have this file /etc/barman/barman.conf/kaboom.conf means this node is taking backups from kaboom

Check the backup server status

bash-4.4$ barman check kaboom 
Server kaboom:
PostgreSQL: OK
superuser or standard user with backup privileges: OK
PostgreSQL streaming: OK
wal_level: OK
replication slot: OK
directories: OK
retention policy settings: OK
backup maximum age: OK (interval provided: 7 days, latest backup age: 27 minutes, 7 seconds)
backup minimum size: OK (72.3 MiB)
wal maximum age: OK (no last_wal_maximum_age provided)
wal size: OK (0 B)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 3 backups, expected at least 3)
ssh: OK (PostgreSQL server)
systemid coherence: OK
pg_receivexlog: OK
pg_receivexlog compatible: OK
receive-wal running: OK
archiver errors: OK

To take the new backup run

bash-4.4$ barman backup kaboom 
Starting backup using rsync-concurrent method for server kaboom in /var/lib/barman/kaboom/base/20240522T164637
Backup start at LSN: 0/C000028 (00000001000000000000000C, 00000028)
Starting backup copy via rsync/SSH for 20240522T164637
Copy done (time: 1 second)
Asking PostgreSQL server to finalize the backup.
Backup size: 74.2 MiB. Actual size on disk: 15.7 MiB (-78.83% deduplication ratio).
Backup end at LSN: 0/C0140E8 (00000001000000000000000C, 000140E8)
Backup completed (start time: 2024-05-22 16:47:35.668097, elapsed time: 4 seconds)
Processing xlog segments from streaming for kaboom
00000001000000000000000C

to list down all the backups run

bash-4.4$ barman list-backup kaboom 
kaboom 20240522T164637 - Wed May 22 16:47:37 2024 - Size: 74.3 MiB - WAL Size: 0 B
kaboom 20240522T161827 - Wed May 22 16:18:29 2024 - Size: 72.3 MiB - WAL Size: 1.1 MiB
kaboom 20240522T161819 - Wed May 22 16:18:21 2024 - Size: 72.3 MiB - WAL Size: 36.2 KiB
kaboom 20240522T161806 - Wed May 22 16:18:08 2024 - Size: 72.3 MiB - WAL Size: 36.2 KiB

By default, barman is set up to take 2 backups each week 1 on Saturday and 1 on Wednesday

Barman cron job files

bash-4.4$ cat /etc/cron.d/barman
# m h  dom mon dow   user     command
* * * * * barman /etc/tpa/barman-home-ok && /usr/bin/barman -q cron

bash-4.4$ cat /etc/cron.d/kaboom
#Ansible: Backup from server kaboom
0 4 * * wed,sat barman /etc/tpa/barman-home-ok && /usr/bin/barman backup kaboo

Connecting with PEM

PEM (Postgres Enterprise Manager) is a management tool by EnterpriseDB for PostgreSQL databases, offering real-time monitoring, performance analysis, configuration management, automated backups, security management, and custom reporting to enhance database performance and administration efficiency. By default, TPA will install PEM on the 443 port

To connect with PEM open your browser and use the following link – Update this accordingly

https:<IP_ADDRESS_OF_PEM_SERVER>/pem
Username = enterprisedb

Get the password from this command

tpaexec show-password test_cluster/ enterprisedb

Where test_cluster is your main cluster directory created by the TPA configure command and enterprisedb is the username against which you want to extract the password