We recently helped a client switch from Heroku PostgreSQL to managing their own PostgreSQL on AWS EC2. In this blog post, we’ll look at what Heroku PostgreSQL offers, its constraints, and outline the steps followed during the migration process.
What is Heroku PostgreSQL?
Heroku Postgres is a database as a service(dbaas) platform that simplifies managing your SQL databases. It offers a fully managed database experience, including automated backups, scaling, and maintenance. It offers different plans varying from small projects to large-scale applications. Plus, it integrates with Heroku’s platform, which makes it easy to deploy and manage your databases in the cloud. This means developers can focus on building their applications without worrying about managing the database infrastructure.
What are the limitations of Heroku PostgreSQL?
- Heroku PostgreSQL offers a managed environment, but it has restrictions on customization and control over database settings. Some organizations may need specific configurations or extensions not supported by Heroku PostgreSQL.
- As data volumes and usage grow, the pricing of Heroku PostgreSQL can increase. Migrating to self-managed PostgreSQL on EC2 has the potential to reduce costs for extensive deployments.
- Heroku does not allow the creation of replica instances outside of its cloud infrastructure. Additionally, it lacks features like superuser access and Replication privilege for users.
- Heroku PostgreSQL does not support logical replication, which is crucial for executing major version upgrades with near zero downtime.
Migration Process
After evaluating multiple options, we used the wal log shipping method to switch fromHeroku PostgreSQL. This involved reaching out to Heroku’s support team to get access to automated backups and WAL files placed on AWS S3. Here’s the diagram illustrating the setup:
- Primary Database: This refers to the primary PostgreSQL database on Heroku.
- WALFiles: These are transaction logs that record all the changes made to a database. They are needed for recovery in case of a database crash.
- Amazon S3: This is an object storage service provided by AWS that offers scalability, data availability, and security. The Heroku PostgreSQL backups and wal archives are stored here.
- Standby Database: This refers to a standby or replica PostgreSQL database. In this configuration, the wal files will be fetched from AWS S3 and applied to the standby to keep it synchronized with the primary database.
- WAL-E: This is the backup tool used by Heroku to preserve base backups and perform continuous archiving of PostgreSQL WAL files.
Log-Shipping using WAL-E
In this section, we’ll install wal-e backup tool and guide you through the entire process of creating a replica from Heroku PostgreSQL to EC2 PostgreSQL. The following steps will be performed on EC2 instance.
1: Install Lzop
LZOP is a file compression tool similar to gzip. It utilizes the LZO data compression library for compression, offering significantly faster compression and decompression speeds compared to gzip, at cost of compression ratio. Heroku uses LZOP tool for storing WAL files and backups on S3.
sudo apt-get install lzop
2: Wal-e Installation
We have installed python 3.5 to work with wal-e installation. This decision is taken due to reported issues on GitHub, indicating that wal-e might encounter hanging or errors with newer Python versions. Further details can be found here: https://github.com/wal-e/wal-e/issues/322
sudo python3 -m pip install envdir
sudo python3 -m pip install wal-e[aws]
Following errors were faced while installing wal-e on
Error 1:
c/_cffi_backend.c:2:10: fatal error: Python.h: No such file or directory
2 | #include <Python.h>
| ^~~~~~~~~~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
ERROR: Failed building wheel for cffi
Running setup.py clean for cffi
Failed to build cffi
ERROR: launchpadlib 1.10.13 requires testresources, which is not installed.
Installing collected packages: setuptools, Cython, pycparser, cffi, greenlet
Running setup.py install for cffi: started
Running setup.py install for cffi: finished with status 'error'
Resolution:
Install the following packages:
sudo apt-get install python3.5-dev
sudo apt install python3-testresources
sudo apt-get install libffi-dev
Error 2:
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 /usr/share/python-wheels/pep517-0.8.2-py2.py3-none-any.whl/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpbk17_dri
cwd: /tmp/pip-install-1ka0ajfk/gevent
Complete output (1 lines):
/usr/bin/python3: can't find '__main__' module in '/usr/share/python-wheels/pep517-0.8.2-py2.py3-none-any.whl/pep517/_in_process.py'
----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 /usr/share/python-wheels/pep517-0.8.2-py2.py3-none-any.whl/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpbk17_dri Check the logs for full command output.
Resolution:
Here we need to upgrade pip version
sudo pip install -U pip
Verify Wal-e installation
wal-e --help
WAL-E connectivity with AWS S3
The Heroku support team provided us AWS S3 credentials for accessing backups and wal files. Let's create secure variables containing the credentials for the S3 bucket within a directory.
mkdir -p /mnt/data/env
ls -l /mnt/data/env
-rw-rw-r-- 1 ec2-user ec2-user 21 Dec 20 17:22 AWS_ACCESS_KEY_ID
-rw-rw-r-- 1 ec2-user ec2-user 10 Dec 20 17:22 AWS_REGION
-rw-rw-r-- 1 ec2-user ec2-user 41 Dec 20 17:22 AWS_SECRET_ACCESS_KEY
-rw-rw-r-- 1 ec2-user ec2-user 92 Dec 20 17:21 WALE_S3_PREFIX
In order to verify above credentials, run the following command to list backups inside S3 bucket.
envdir /mnt/data/env wal-e backup-list
This command will return all the completed physical backups created by Heroku PostgreSQL.
3: Fetch Backups
Use wal-e backup-fetch command to restore the latest base backup of your Heroku Postgres database into the PostgreSQL data directory volume of your EC2 instance.
envdir /mnt/data/env wal-e backup-fetch --blind-restore /mnt/data/database/ LATEST
Blind Restore: If you are unable to reproduce tablespace storage structures prior to running backup-fetch
you can set the option flag --blind-restore
. This will direct WAL-E to skip the symlink verification process and place all data directly in the PG_DATABASE_DIRECTORY/pg_tblspc
path.
Note: Make sure that your PostgreSQL service is stopped and the data directory is empty.
After fetching the backup, we’ll configure standby by enabling WAL recovery and setting parameters accordingly.
4: Configure Standby
postgresql.auto.conf
Check data inside Postgresql.auto.conf and remove any unnecessary parameters.
Incase of heroku, we found this
cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
aiven.pg_security_agent = 'on'
Comment aiven.pg_security_agent variable and save the above file.
Note: The files listed below are not included in Heroku backups. We must manually create and configure them
Postgresql.conf
Get postgresql default configuration file as per major version. We are using PostgreSQL 13 in our example and the configuration file can be taken from main PostgreSQL repository.
When creating a read-only standby, certain parameter values must match between the primary and replica database or else the standby will fail to receive wal logs. The following parameters are mentioned here:
Parameter | Value |
listen_addresses | ‘*’ |
wal_level | logical |
shared_buffers | Should be set to at least 25% of available memory |
effective_cache_size | Should be set to at least 70-80% of available memory |
max_worker_processes | 20 (This parameter must be set to the same or higher value than on the master server) |
max_connections | 500 (The maximum number of concurrent connections to the database server. This also needs to be set to the same or higher value than on the primary server. At Heroku, this value is set to 500 as per the plan) |
hot_standby | on |
restore_command | ‘envdir /database/env wal-e wal-fetch %f %p’ |
recovery_target_timeline | ‘latest’ |
restore_command is the main parameter here which defines the command to retrieve an archived WAL file. %f is replaced by the name of the desired WAL file and %p is replaced by the path name to copy the WAL file to(Data directory).
Note: The values of the parameters above may vary depending on the instance tier of Heroku PostgreSQL.
In addition to above parameters, you can set other parameters such as:
log_destination
logging_collector
work_mem
effective_io_concurrency
max_wal_size
Min_wal_size
wal_compression
Note: Ensure that proper logging is enabled on replica so we can detect errors easily.
pg_hba.conf
The basic entries will be like this
# "local" is for Unix domain socket connections only
local all postgres peer
local all all md5
# IPv4 local connections:
host all all 0.0.0.0/0 md5
# IPv6 local connections:
host all all ::1/128 md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication postgres peer
host replication all 127.0.0.1/32 md5
host replication all ::1/128 md5
standby.signal
Create an empty standby.signal file inside the data directory to start log shipping. If the file standby.signal is present in the data directory, the server will start in standby mode and enter recovery.(Works for PostgreSQL v12 and above)
touch standby.signal
5: Remove old wal folder
Delete the existing pg_wal folder and create a new one. In Heroku Postgres servers, wal files are stored on a separate volume from the main data directory and when restoring a backup from wal-e, the WAL directory will be a symlink.
rm -rf /DATA_DIRECTORY/pg_wal (removes old symlink)
mkdir /DATA_DIRECTORY/pg_wal
6: Remove old tablespace directory
Heroku has separate volumes for temporary tables and operations such as disk based hash and sorting. We need to remove it as our self-hosted PostgreSQL won’t be able to find this directory.
rm -rf /database/pg_tblspc/16386/
Note that the temp tablespace folder might sometimes be named differently.
7: Start database
/usr/lib/postgresql/13/bin/pg_ctl -D /mnt/data/database/ start
The restart process will take some time as it fetches wal file from S3 and attempts to reach a consistent state. During this period, the database will not be accessible.
After a while, once PostgreSQL reaches a consistent state, you can log in using psql. The database will be in read-only mode, and you won’t be able to create any new objects. This can be verified using the following command:
select pg_is_in_recovery();
value should be equal to ‘t’
The log shipping process is now completed. In the next section, we will outline the steps for promoting the standby database.
Promotion Steps
This section is segmented into three parts.
Pre-check Steps
1: Stop Application
2: Run the following command to check for active connections on primary. Let the sessions complete or cancel them as required.
select * from pg_stat_activity where state = ‘active’;
3: Run the following command multiple times on primary to check the current wal file. We need to make sure that all wal files are replicated to standby.
select pg_walfile_name(pg_current_wal_lsn());
Once the walfile name stays constant, switch the walfile so it can send the current wal files content to standby using the following command.
select pg_switch_wal();
Promotion
Once confirmed, promote standby
/usr/lib/postgresql/13/bin/pg_ctl -D /mnt/data/database/ promote
Post-check Steps
1: Check if the promotion was done successfully by running this command on new master:
select pg_is_in_recovery();
value should be equal to ‘f’
2: Now change the required parameters inside the postgresql.conf file as per current hardware. We also need to enable archiving. Make sure that the archiving directory exists with read/write permissions.
vi /mnt/data/database/postgresql.conf
archive_mode = on
archive_command = "cp %p /mnt/data/archives/%f"
restore_command = ""
3: Restart database for these settings to take effect
/usr/lib/postgresql/13/bin/pg_ctl -D /mnt/data/database/ restart
4: Drop temp_Tablespace used in heroku.
Select * from pg_tablespaces; => To check for correct tablespace name
Output:
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------
1663 | pg_default | 10 |
1664 | pg_global | 10 |
16386 | ephemeral | 10 |
We are going to drop ephemeral tablespace as this is not required for our installation
DROP TABLESPACE ephemeral;
5: Connect applications and start transactions.
6: After promotion, make sure to take full database physical backup with wal files using pg_basebackup utility. A sample command would be like this
/usr/lib/postgresql/13/bin/pg_basebackup -D /mnt/data/backups/fullbkp_$(date +"%d%m%y") --checkpoint=fast -Xs -P
This marks the completion of our migration process.