Vacuum in PostgreSQL

If you’re a PostgreSQL user, you’ve undoubtedly come across the term “vacuum“. This operation plays a pivotal role in maintaining the optimal performance of your database while preventing unnecessary data bloat. In this blog, we’ll understand how vacuum works on high level, its significance, types, server parameters that influence autovacuum operations, and general FAQ’s on vacuum.

Understanding Vacuum

In PostgreSQL, when you delete or update rows, the old data isn’t immediately removed from the disk. Instead, it’s labeled as “dead“, and the space it occupies becomes “bloat“. Over time, these dead rows accumulate, potentially slowing down your database. This is where vacuuming comes into play.

Vacuuming is the process of cleaning up your database by removing dead rows and optimizing its structure. It ensures that your database remains efficient and performs well, even as your data evolves.

In the first step, PostgreSQL scans a target table to identify dead tuples and, if possible, freeze old tuples. This list of dead tuples is stored in “maintenance_work_mem“. Subsequently, PostgreSQL eliminates index tuples first by referencing the dead tuple list.

The second step involves the removal of dead tuples, reordering the remaining ones, and updating the Free Space Map (FSM) and Visibility Map (VM) on a page-by-page basis. Following this, PostgreSQL updates the statistics and system catalogs related to vacuum processing for each target table.

Why Vacuum Is Essential

Preventing Bloat: Without regular vacuuming, your database can become cluttered with dead rows, leading to performance degradation and wasted disk space.
Maintaining Performance: Frequent vacuuming ensures your queries run smoothly by preventing unnecessary scans of dead rows.
Transaction ID Wraparound: Vacuuming also plays a critical role in preventing transaction ID wraparound issues. The Transaction Wraparound Problem can manifest in PostgreSQL due to its handling of transaction IDs (XIDs). In PostgreSQL, every transaction is allocated a unique 32-bit transaction ID when it begins. These transaction IDs have a finite range of values. Once the maximum value is reached, PostgreSQL cycles back to the minimum value and reuses transaction IDs, which is where the issue arises.
In cases where a cluster operates for an extended period, exceeding four billion transactions, it becomes susceptible to transaction ID wraparound. In such scenarios, the XID counter resets to zero, causing transactions from the past to suddenly appear as if they belong to the future. This results in their output becoming invisible, effectively causing loss of data. To avoid this situation, it’s necessary to perform vacuuming on each table within every database at least once for every two billion transactions.Note: PostgreSQL stops allowing WRITE operations and switches the database to READONLY mode if there are 2,000,000,000 unvacuumed transactions.

Types of Vacuum

PostgreSQL offers various types of vacuuming processes to address specific maintenance needs:

VACUUM: This standard form of vacuuming can run in parallel with ongoing database operations, allowing normal functionality(SELECT,UPDATE,DELETE,INSERT…) during the process. DDL commands such as ALTER TABLE can blocked during vacuum execution.
VACUUM FULL: While a full vacuum reclaims more space by compacting tables and indexes, it requires an exclusive lock on the table, means all other queries will be blocked during vacuum full run.
VACUUM FREEZE: This process deals with the Transaction Wraparound Problem by freezing old transactions, preventing them from being affected by transaction ID wraparound. It is normally managed by the autovacuum process but can also be run manually.
VACUUM PARALLEL: Introduced in PostgreSQL v13, vacuum can now perform the Index vacuum phase and index cleanup phase with parallel workers. An Individual worker is assigned to an index which is limited to max_parallel_maintenance_workers. The target index must be greater than or equal to min_parallel_index_scan_size.
Parallel vacuum is always disabled in autovacuum. max_parallel_maintenance_workers sets the maximum number of vacuum parallel workers that can be started by a single utility command such as CREATE INDEX for B-tree index and VACUUM without FULL option. max_parallel_maintenance_workers value is taken from the pool of processes established by max_parallel_workers, limited by max_worker_processes.
AUTOVACUUM: autovacuum is an automated process that handles both regular vacuuming and analyze operations. It monitors the database for tables that require vacuuming or analysis based on predefined thresholds and settings.
REINDEX: While not a type of vacuuming per se, reindexing is essential for maintaining database indexes’ performance and reducing fragmentation.

Server Parameters for Autovacuum

Configuring server parameters can fine-tune autovacuum to match your application’s requirements. Some critical parameters include:

autovacuum_max_workers

This parameter sets the maximum number of autovacuum processes (excluding the autovacuum launcher) that can run concurrently. The default value is three, meaning that at any given time, three autovacuum sessions can run simultaneously, or three tables can be vacuumed concurrently. When these three autovacuum sessions are occupied, other tables have to wait for their turn to be vacuumed, and this can lead to the accumulation of dead tuples. This parameter can only be configured at the server’s startup.

Increasing the number of worker threads decreases the cost limit for each thread. As each thread is assigned a lower autovacuum_vacuum_cost_limit value, it will enter sleep mode more frequently, based on autovacuum_vacuum_cost_delay, ultimately causing the entire vacuum process to slow down. A recommended approach is to increase autovacuum_vacuum_cost_limit to a higher value and then adjust the autovacuum_max_workers accordingly.

autovacuum_naptime

Specifies the minimum delay between autovacuum runs on a specific database. During each round, the autovacuum daemon assesses the database and executes VACUUM and ANALYZE commands as needed for its tables. If no units are specified, the default is one minute (1min).

autovacuum_vacuum_threshold

This parameter defines the minimum number of obsolete records required to trigger an autovacuum operation on a table. The default is set to 50 tuples. Configuration of this parameter is possible in the postgresql.conf file or through the server command line, although it can be overridden for individual tables by altering their storage parameters.

autovacuum_vacuum_scale_factor

It represents the fraction of table records added to the vacuum_threshold formula. For instance, a value of 0.2 corresponds to 20% of the table records. With very large tables, reaching this threshold can be time-consuming and might result in longer intervals between autovacuums, extended autovacuum durations, or even autovacuum being blocked if active transactions on the table are holding locks. This parameter interacts with autovacuum_vacuum_threshold. The formula for vacuum threshold is as follows:

vacuum_threshold = autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * number of tuples

If the number of dead tuples since the last VACUUM surpasses the vacuum_threshold, the table undergoes vacuuming. Typically, adjustments to this parameter are unnecessary if you are using the scale factor. However, a useful trick is setting the scale factor to zero and then configuring a specific threshold. For example, by setting autovacuum_vacuum_scale_factor to 0 and autovacuum_vacuum_threshold to 100,000, a table will undergo autovacuum when the number of dead rows exceeds 100,000.

autovacuum_analyze_threshold

This parameter determines the minimum number of inserted, deleted, or updated tuples in a table that must be reached before an autoanalyze operation begins. The default is set to 50 tuples. Similar to other autovacuum parameters, this can be adjusted in the postgresql.conf file or on the server command line, and it can be overridden for specific tables by modifying their storage parameters.

autovacuum_analyze_scale_factor

This parameter indicates the fraction of table records added to the analyze_threshold formula. The default is 0.1, equivalent to 10% of the table size.

analyze_threshold = autovacuum_analyze_threshold + autovacuum_analyze_scale_factor * number of tuples

autovacuum_vacuum_insert_threshold

Specifies the number of inserted tuples necessary to trigger a VACUUM operation in a table. The default is set to 1,000 tuples. If -1 is specified, autovacuum will not initiate a VACUUM operation based on the number of inserts. This parameter, introduced in PostgreSQL 13, can be configured in the postgresql.conf file or on the server command line. It can also be overridden for specific tables by adjusting their storage parameters.

autovacuum_vacuum_insert_scale_factor

This parameter represents the fraction of table records added to the insert_vacuum_threshold formula. The default is 0.2, corresponding to 20% of the table size.

insert_vacuum_threshold = autovacuum_vacuum_insert_threshold + autovacuum_vacuum_insert_scale_factor * number of tuples

When the number of tuples inserted since the last vacuum exceeds the insert_vacuum_threshold, the table is subjected to vacuuming.

autovacuum_freeze_max_age

The autovacuum_freeze_max_age parameter determines how old a table must be(in transactions) before a Autovacuum performs a freeze operation to mark applicable rows as permanently visible within a table. The default is set to 200000000(200 million). Note that the system will launch autovacuum processes to prevent wraparound even when autovacuum is disabled.

autovacuum_vacuum_cost_limit

This parameter specifies the cost limit value for automatic VACUUM operations. If set to -1 (the default), it utilizes the regular vacuum_cost_limit value. Notably, this value is global, and each worker process receives only a fraction (1/autovacuum_max_workers) of the total cost limit. Hence, increasing the number of workers doesn’t lead to performance improvements and can make them slower.

Single thread cost limit = autovacuum_vacuum_cost_limit / autovacuum_max_workers

Configuration of this parameter can be performed in the postgresql.conf file or on the server command line. It can also be overridden for specific tables by modifying their storage parameters.

autovacuum_vacuum_cost_delay

This parameter designates the cost delay value used in automatic VACUUM operations. If -1 is specified, it resorts to the regular vacuum_cost_delay value. If no units are specified, it’s assumed to be in milliseconds, with a default value of 2 milliseconds. This parameter can only be set in the postgresql.conf file or on the server command line; but the setting can be overridden for individual tables by changing table storage parameters.

Whenever an autovacuum worker does work (delete a tuple, read a block, …), it calculates the cost as per parallel autovacuum workers . Once that cost exceeds autovacuum_vacuum_cost_limit, it makes a pause of autovacuum_vacuum_cost_delay.

FAQ’S

1: How to make autovacuum aggressive?

Enable more autovacuum workers, increase cost limit for each worker, reduce sleep time between autovacuum runs and make autovacuum worker threads less interruptable by setting cost delay.(This can cost higher I/O and CPU consumption)

autovacuum_naptime = 15s
autovacuum_max_workers = 6
autovacuum_vacuum_cost_limit = 1200
autovacuum_vacuum_cost_delay = 1ms

Lower the thresholds for auto-vacuum and auto-analyze to trigger sooner:

autovacuum_vacuum_threshold = 25
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_threshold = 10
autovacuum_analyze_scale_factor = 0.05

Also enable log_autovacuum_min_duration parameter which causes each action executed by autovacuum to be logged if it ran for at least the specified amount of time

log_autovacuum_min_duration = 5s (It will log for autovacuum operations which exceeds 5s runtime)

Be cautious while setting these parameters as wrong values can lead to wastage of I/O and CPU resources. Note that autovacuum parameters can also be adjusted on table level, which might be good for selective tables that get frequent UPDATES and DELETES.

2: How to monitor transaction wraparound for database and tables?

We can run the following query to check database age and the percentage towards emergency autovacuum(Where autovaccum will forcefully freeze old tuples) and wraparound_risk_percent(where Postgresql can set database to readonly mode for running vacuum operations)

SELECT datname,
age(datfrozenxid),
100*(age(datfrozenxid)/current_setting('autovacuum_freeze_max_age')::float) AS emergency_autovacuum_percent,
(age(datfrozenxid)::numeric/2000000000*100)::numeric(4,2) as "wraparound_risk_percent"
FROM pg_database ORDER BY 2 DESC;

To check table wise age, run the following query:

SELECT c.oid::regclass
, age(c.relfrozenxid)
, 100*(age(c.relfrozenxid)/current_setting('autovacuum_freeze_max_age')::float) AS emergency_autovacuum_percent
, pg_size_pretty(pg_total_relation_size(c.oid))
FROM pg_class c
JOIN pg_namespace n on c.relnamespace = n.oid
WHERE relkind IN ('r', 't', 'm')
AND n.nspname NOT IN ('pg_toast','pg_catalog','information_schema')
ORDER BY 2 DESC LIMIT 100;

The built-in view pg_stat_all_tables enables you to find out the last time a vacuum or autovacuum process successfully ran on each of your tables:

SELECT schemaname,
relname,
n_live_tup,
n_dead_tup,
last_autovacuum,
last_vacuum
FROM pg_stat_all_tables
where schemaname != 'pg_catalog'
and n_live_tup > 0
ORDER BY last_autovacuum ASC nulls first

3: Vacuum is not cleaning up dead rows. What to check?

Long-running transactions

If you have a transaction that’s been running for several hours or days, the transaction might be holding onto rows, not allowing vacuum to clean the rows. You can find long-running transactions by running:

SELECT pid, 
datname, 
usename,
state, 
backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;

The xmin horizon(backend_xmin) is called the oldest xmin, and describes “until which point (in terms of transactions) the vacuum process can clean up dead rows”. In other words, VACUUM can’t clean up dead rows after this xmin horizon. When you see a verbose log of vacuum like “58500 rows are dead but not yet removable”, it means that there were 58500 dead rows, but VACUUM wasn’t able to clean them up because their transactionID is probably above than the xmin horizon.

A good blog on exploring backend_xmin: Details of PostgreSQL Bloat Point

Assign a limit to long running transactions using the statement_timeout and idle_in_transaction_session_timeout parameters. you can terminate those sessions by running pg_terminate_backend(pid) function.

To deal with long running transactions, you can tune statement_timeout and idle_in_transaction_session_timeout parameters inside postgresql.conf.

hot_standby_feedback

Typically, PostgreSQL can clean up a row version as soon as it isn’t visible to any transaction. If you’re running PostgreSQL with a replica, it’s possible for a vacuum to clean up a row version on the primary which is needed by a query on the standby. This situation is called a “replication conflict” and when it’s detected, the query on the standby node will be cancelled.

To prevent queries on the standby node from being cancelled due to replication conflicts, you can set hot_standby_feedback = on, inside postgresql.conf file, which periodically inform the primary about the oldest transaction running on the standby(It sends the minimum backend_xmin to primary through wal workers). If the primary knows about old running transactions on the standby, it can make VACUUM keep rows until the standbys are done. However, setting hot_standby_feedback = on also means that long running queries on the standby have the capability to block rows from getting cleaned up on the primary.

To get the xmin horizon of all your standbys, you can run:

SELECT pid, datname, usename,
state, backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;

If streaming replication slot is enabled with hot_standby_feedback = on, xmin assigned in pg_replication_slot is also accounted for deciding on removing dead tuples:

SELECT slot_name, slot_type,
database,active, xmin
FROM pg_replication_slots
ORDER BY age(xmin);

For logical replication, you would be seeing bloat only for catalog tables(catalog_xmin). You can drop inactive or unneeded replication slots using pg_drop_replication_slot(‘slotname’) function.

Abandoned prepared transactions

A transaction prepared for a two-phase commit will prevent vacuum to cleanup tuples until it is either committed or rolled back.

You can find the prepared transaction by running the following command:

SELECT gid, prepared, owner, database, transaction AS xmin
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;

Once identified, you can run commit(COMMIT PREPARED) or rollback(ROLLBACK PREPARED) statement.

4: What are the best practices for vacuum?

Monitor the age of your database/tables to identify potential issues proactively.
Keep an eye on long-running transactions and terminate them when necessary.
Remove unused indexes, close database sessions properly, and remove unused replication slots as they can block vacuum.
Regularly assess your database transaction “burn rate” by monitoring transaction ID usage using the following query:
```
SELECT * FROM txid_current();
```
Schedule a periodic VACUUM/VACUUM FREEZE of your database during low/off production hours as this will reduce the workload of autovacuum.

References:

https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/

https://www.citusdata.com/blog/2022/07/28/debugging-postgres-autovacuum-problems-13-tips/

How to simulate the deadly, infamous, misunderstood & complex ‘Transaction Wraparound Problem’ in PostgreSQL …