In the database landscape, connection pooling is a must to ensure everything runs smoothly. It’s like having a savvy assistant that saves you money by avoiding a new connection for every query. This means less database work and quicker answers for you, especially when multiple users are accessing it simultaneously. In short, connection pooling in databases is a game-changer for keeping your systems quick and efficient.
Understanding connection pooling: A real-world example
Imagine a library where readers frequently borrow and return books. Connection pooling acts like a diligent librarian who, instead of shelving each book after every checkout, keeps a stack readily available on the counter. This way, when another reader requests the same book, it’s readily available, eliminating the need to search the entire library each time. In the database world, connection pooling functions similarly, drastically reducing overhead and accelerating the connection time.
PgBouncer: lightweight connection pooler for PostgreSQL
PgBouncer is a lightweight yet powerful connection pooling tool for PostgreSQL. It efficiently manages and reuses database connections, reducing the load on the server and improving performance. It acts as an intermediary between applications and the PostgreSQL database, optimizing connection usage and enhancing scalability.
How PgBouncer actually works?
These are the steps that PgBouncer takes when a client requests a connection via PgBouncer
- Authentication: Initially, PgBouncer checks the userslist.txt file for username and password/hash combinations when a connection request is made. If the username exists in the file and the password is a match, the process proceeds to the next step. However, if authentication fails at this stage, PgBouncer establishes a connection to PostgreSQL using a predefined user whose username and password must be listed in the userslist.txt file. It then executes the auth_query to retrieve the requested user’s password and attempts to match it with the provided password. If authentication succeeds, the process moves on to the next step; otherwise, no further action is taken.
- Connection Pooling: PgBouncer maintains a pool of database connections to PostgreSQL with the same user and database names. After completing the authentication process, PgBouncer scans its connection pool to find a matching user and database. If a connection is found, it is immediately returned to the client. However, if the requested connection is not found in the pool, PgBouncer initiates a new connection with PostgreSQL, which can introduce some overhead.
- Resetting Connections for Reuse: When a client finishes executing transactions and requests to disconnect from the server, PgBouncer doesn’t immediately return the connection to the pool. Instead, it first executes a reset query to clear all information safely, ensuring a clean exit. Only after this process is complete, the connection is returned to the pool and becomes available for the next request.
Installing PgBouncer for PostgreSQL
On ubuntu/Debian based systems
sudo apt-get install pgbouncer -y
On CentoOS, Fedora based systems
sudo apt-get install pgbouncer -y
Important PgBouncer setup files
When we install PgBouncer it places 2 very important files on the following location /etc/pgbouncer
- pgbouncer.ini – This is the main configuration file for PgBouncer that is used to configure the main settings, including database connections, authentication, and connection pooling parameters,
- Userlist.txt – This file specifies user and password combinations permitted to connect to PgBouncer, enhancing security and access control.
Important PgBouncer configuration options for connection pooling
Name: pool_mode
Default value: session
Expected values: session, transaction, statement
Description: Tell the PgBouncer when to release the connection back to the pool
- Session means that each client gets a dedicated database connection for the entire session. This ensures that a client’s database state and settings are maintained throughout the session. One session is closed connection is returned back to the pool
- Transaction means that each client gets a dedicated database connection for the duration of a single transaction. Once a transaction is completed successfully the connection is returned to the pool for reuse by other clients, promoting efficient resource utilization
Name: max_client_conn
Default value: 100
Description: It is used to define the maximum number of client connections that PgBouncer will allow concurrently. Once this limit is reached, PgBouncer will not accept additional client connections until some existing connections are released or closed.
Name: default_pool_size
Default value: 20
Description: This parameter specifies the initial number of database connections to be created in the connection pool when PgBouncer first starts. Clients can then borrow and return connections from this pool as needed.
Name: min_pool_size
Default value: 0 (Disable by default)
Description: This parameter in PgBouncer is used to add more server connections to the pool if the number of available connections falls below this specified value. For example, if you set min_pool_size to 5, PgBouncer will make sure that there are always at least 5 server connections open in the pool, even if there are no active clients. This can be useful to ensure that there are some connections readily available when clients suddenly start sending requests after a period of inactivity,
Name: reserve_pool_size
Default value: 0 (Disable)
Description: This parameter specifies how many additional connections to allow in a connection pool when needed, particularly when the pool is temporarily exhausted due to high client demand. PgBouncer will allocate additional connections(mentioned in reserve_pool_size) to handle the increased load.
Name: reserve_pool_timeout
Default value: 5 sec
Description: This parameter determines how long a client connection request should wait when the connection pool is exhausted and if this time is reached then an additional connection from reserve_pool_size will be allocated to the client. When a client connection request is made, and the connection pool is exhausted meaning all available connections are in use and If the time it takes to release a connection back to the pool (due to another client disconnecting) is less than the reserve_pool_timeout, the requesting client will wait for a connection to become available. PgBouncer will not immediately allocate a new connection. If the wait time exceeds the reserve_pool_timeout and no connection becomes available, PgBouncer will allocate a new connection from reserve_pool_size to satisfy the client’s request
Name: max_db_connections
Default value: 0 (unlimited)
Description: This parameter allows you to set a limit on the maximum number of server connections per database (not per user) that PgBouncer will allow. This limit is enforced on a per-database basis within PgBouncer.
NOTE: Reaching the limit means that closing a client connection in one pool doesn’t instantly free up a server connection for another pool. This is because the server connection for the first pool remains active. A new server connection is only created for the waiting pool once the existing server connection times out due to inactivity.
Name: max_user_connections
Default value: 0 (unlimited)
Description: This parameter is used to set a limit on the maximum number of concurrent client connections allowed for a specific database user in PostgreSQL. This limit is enforced on a per-user basis.
NOTE: Reaching the limit means that closing a client connection in one pool doesn’t instantly free up a server connection for another pool. This is because the server connection for the first pool remains active. A new server connection is only created for the waiting pool once the existing server connection times out due to inactivity
Name: server_round_robin
Default value: 0
Description: This parameter determines the behavior of connection pooling when multiple PostgreSQL servers are configured. This parameter controls the order in which PgBouncer distributes client connections among the configured PostgreSQL servers
Imagine you have PgBouncer configured with three PostgreSQL database servers (Server A, Server B, Server C). PgBouncer, by default, reuses server connections in a “last-in, first-out” (LIFO) manner
- Client 1 connects to PgBouncer and gets directed to Server A
- Client 2 connects to PgBouncer and also gets directed to Server A
- Client 3 connects to PgBouncer and gets directed to Server B
- Client 4 connects to PgBouncer and gets directed to Server B
- Client 5 connects to PgBouncer and gets directed to Server C
In this scenario, Server A receives two client connections before any connections are sent to Server B or Server C. This can lead to uneven server load distribution.
Scenario with server_round_robin (Enabled – 1):
Now, let’s consider the same setup, but with server_round_robin enabled.
- Client 1 connects to PgBouncer and gets directed to Server A
- Client 2 connects to PgBouncer and gets directed to Server B
- Client 3 connects to PgBouncer and gets directed to Server C
- Client 4 connects to PgBouncer, and PgBouncer cycles back to Server A
- Client 5 connects to PgBouncer, and PgBouncer cycles to Server B
With server_round_robin enabled, PgBouncer distributes client connections evenly in a round-robin manner among the available PostgreSQL servers. Each new client connection is sent to the next server in the list, cycling through all the servers
Name: auth_query
Default value: SELECT usename, passwd FROM pg_shadow WHERE usename=$1
Description: Specify a query to retrieve a password for a specific user. It allows you to define a custom SQL query that PgBouncer will use to authenticate users and determine whether they are allowed to connect to the PostgreSQL database.
Setting up PgBouncer
Our environment and software versions
- 2 core 4gb ram 30 gb hard drive
- Ubuntu 22.04
- PostgreSQL 15.4
- PgBouncer 1.20.1
Pre-requirements
Install PostgreSQL by setting up pgdg repository:
sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - sudo apt-get update sudo apt-get -y install postgresql-15 -y
Switch to postgres user. And connect to psql to create a new user for PgBouncer
su postgres
postgres@7c41ed91fbea:~$ psql -d postgres -h localhost -p 5432 psql (15.4 (Ubuntu 15.4-2.pgdg22.04+1))
Type "help" for help.
postgres=# create user pgbouncer_user WITH PASSWORD '12345' SUPERUSER; CREATE ROLE
CREATE ROLE
Get password hash for pgbouncer_user. That is needed to authenticate with PgBouncer
postgres=# select rolname, rolpassword from pg_authid where rolname='pgbouncer_user';
In our case hash is
SCRAM-SHA-256$4096:i80MMIi21AiT7BCTP0+YAw==$jmCx44v/P7UNt6kvyqWgNRNhiBBMQ7Fnxe4JyQj/PsQ=:4xnxiympME/cLjhV2qPGQdqGKwfaNmjAvj6Q86kkOM0=
Open pgbouncer.ini file in your favorite editor
vim /etc/pgbouncer/pgbouncer.ini
Under [database] section place the following code and save the file
pgbouncer_db = dbname=postgres host=localhost port=5432 user=pgbouncer_user
NOTE: Remember to change the properties according to your need
Open userlist.txt file
vim /etc/pgbouncer/userlist.txt
Place the following line and save the file
"pgbouncer_user" "SCRAM-SHA-256$4096:i80MMIi21AiT7BCTP0+YAw==$jmCx44v/P7UNt6kvyqWgNRNhiBBMQ7Fnxe4JyQj/PsQ=:4xnxiympME/cLjhV2qPGQdqGKwfaNmjAvj6Q86kkOM0="
It’s time to launch the PgBouncer. Run PgBouncer as non root user
pgbouncer /etc/pgbouncer/pgbouncer.ini
Calculate TPS without Connection Pooling
Initialize the database via pgbench
postgres@7c41ed91fbea:~$ pgbench -i postgres dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data (client-side)... 100000 of 100000 tuples (100%) done (elapsed 0.06 s, remaining 0.00 s) vacuuming... creating primary keys... done in 0.16 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 0.07 s, vacuum 0.03 s, primary keys 0.05 s). postgres@7c41ed91fbea:~$
Run tests
This test simulates 30 concurrent clients sending queries to the database. Each client will execute 1000 transactions. -C option is to tell PgBench to close and open a database connection for each transaction
postgres@f3f5f19c098e:~$ pgbench postgres -c 25 -t 1500 -C -p 5432
pgbench (15.4 (Ubuntu 15.4-2.pgdg22.04+1))
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 25
number of threads: 1
maximum number of tries: 1
number of transactions per client: 1500
number of transactions actually processed: 37500/37500
number of failed transactions: 0 (0.000%)
latency average = 81.224 ms
average connection time = 2.242 ms
tps = 307.790892 (including reconnection times)
NOTE: We got 307.7 transactions per second without connection pooling
Calculate TPS with Connection Pooling
We are using pgbouncer_user and the default port for PgBouncer is 6432
postgres@f3f5f19c098e:~$ pgbench postgres -c 25 -t 1500 -C -p 6432 -U pgbouncer_user
pgbench (15.4 (Ubuntu 15.4-2.pgdg22.04+1))
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 25
number of threads: 1
maximum number of tries: 1
number of transactions per client: 1500
number of transactions actually processed: 37500/37500
number of failed transactions: 0 (0.000%)
latency average = 38.492 ms
average connection time = 0.371 ms
tps = 649.489814 (including reconnection times)
Wow! We got 649.4 transactions per section means a 115% increase in performance.
Also, the average connection time is reduced to 0.37 ms which was 2.2 ms before. So we see a significant improvement in using connection pooling for our cluster
What PgBouncer does not support?
PgBouncer lacks load balancing and auto-failover features. For these capabilities, we suggest exploring our tool, pg_cirrus.
In conclusion, the implementation of connection pooling in your database cluster is a game-changer when it comes to enhancing performance and scalability. As we’ve seen throughout this journey, it optimizes resource utilization, reduces overhead, and ultimately unlocks the true potential of your database system.