Database monitoring, performance tuning and query optimization are critical operations for maintaining an efficient database system. A key component in PostgreSQL for this purpose is the pg_stat_all_tables view.
pg_stat_all_tables offers real time statistics on table activities such as number of sequential and index scans performed on a table, updates, deletes, inserts, and more. It also offers information on dead tuples along with vacuum and analyze stats which enables DB admins to make data-driven decisions. Here’s a table detailing the columns in the pg_stat_all_tables:
Column Name | Description |
relid | The Object Identifier (OID) of the table. |
schemaname | The name of the schema that contains the table. |
relname | The name of the table. |
seq_scan | Total no of sequential scans count on this table |
last_seq_scan(Introduced in PG16) | The time of the last sequential scan on this table, based on the most recent transaction stop time. |
seq_tup_read | The number of tuples read by sequential scans. |
idx_scan | The number of index scans initiated on this table. |
last_idx_scan(Introduced in PG16) | The time of the last index scan on this table, based on the most recent transaction stop time. It doesn’t provide information of which index was used during the latest scan. |
idx_tup_fetch | The number of tuples fetched by index scans. |
n_tup_ins | The number of tuples inserted into the table. |
n_tup_upd | The number of tuples updated in the table. |
n_tup_del | The number of tuples deleted from the table. |
n_tup_hot_upd | The number of tuples ‘HOT’ updated (Heap-Only Tuples). |
n_live_tup | The estimated number of live tuples in the table. |
n_dead_tup | The estimated number of dead tuples in the table. |
n_mod_since_analyze | The number of tuples modified since the last analyze operation. |
n_ins_since_vacuum | Estimated number of rows inserted since this table was last vacuumed |
last_vacuum | The timestamp of the last vacuum operation performed on this table. |
last_autovacuum | The timestamp of the last automatic vacuum operation performed on this table. |
last_analyze | The timestamp of the last analyze operation performed on this table. |
last_autoanalyze | The timestamp of the last automatic analyze operation performed on this table. |
vacuum_count | The number of times this table has been manually vacuumed. |
autovacuum_count | The number of times this table has been auto-vacuumed. |
analyze_count | The number of times this table has been manually analyzed. |
autoanalyze_count | The number of times this table has been auto-analyzed. |
For more detailed information, you can refer to PostgreSQL’s official documentation on monitoring statistics views: Monitoring Stats Views.
1: How to identify tables with the highest frequency of sequential scans in a PostgreSQL database?
SELECT
schemaname,
relname,
Seq_scan,
idx_scan
seq_tup_read,
seq_tup_read / seq_scan as avg_seq_read
FROM
pg_stat_all_tables
WHERE
seq_scan > 0
AND
schemaname not in (‘pg_catalog’,’information_schema’)
ORDER BY
Avg_seq_read DESC
LIMIT 10;
This query will list the top 10 tables based on the average number of tuples read in sequential scans (avg_seq_read). We can also change the orderby clause to seq_scan parameter. Observing a high number of sequential scans with a low count of index scans on a table could indicate that the table may benefit from indexing, especially if the queries executed on this table frequently use WHERE clauses.
2: How to identify unused or infrequently accessed tables in postgresql?
SELECT
schemaname,
relname,
seq_scan,
Idx_scan,
(COALESCE(seq_scan, 0) + COALESCE(idx_scan, 0)) as total_scans_performed
FROM
pg_stat_all_tables
WHERE
(COALESCE(seq_scan, 0) + COALESCE(idx_scan, 0)) < 10
AND schemaname not in (‘pg_catalog’, ‘information_schema’)
ORDER BY
5 DESC;
This query will identify tables that have a total scan count of less than 10(threshold). In PostgreSQL v16, the addition of the last_seq_scan and last_idx_scan columns enables us to determine the Last Access Time of the tables.
More from the Blog: Database Concurrency: Two phase Locking (2PL) to MVCC – Part 1
3: How to check the write activity of tables in PostgreSQL?
SELECT
st.schemaname,st.relname,
pg_size_pretty(pg_total_relation_size(st.relid)) as Total_Size,
st.seq_scan,
st.idx_scan,
st.n_tup_ins,
st.n_tup_upd,
st.n_tup_del,
st.n_tup_hot_upd,
st.n_tup_hot_upd * 100 / (case when st.n_tup_upd > 0 then st.n_tup_upd else 1 end) as hot_percentage
from pg_stat_all_tables st
WHERE st.schemaname not in (‘pg_catalog’,’information_schema’)
order by Total_Size DESC;
This data helps us grasp the DML activity on the table. The n_tup_upd and n_tup_hot_upd columns in the view indicate the total counts of regular and HOT(Heap-Only Tuple) updates for each table. It’s important to focus on tables with a low hot_percentage and a high frequency of write operations indicated by n_tup_ins, n_tup_upd. Regular monitoring of these statistics is beneficial in order to understand the write pattern.
4: How to determine the number of live and dead tuples in a table and check their vacuum status?
select
schemaname,
relname,
n_live_tup,
n_dead_tup,
n_dead_tup * 100 / (case when n_live_tup > 0 then n_live_tup else 1 end) as dead_rows_percent,
last_autovacuum,
last_autoanalyze,
n_dead_tup,
relname
from
pg_stat_all_tables
WHERE
schemaname not in (‘pg_catalog’,’information_schema’)
ORDER BY
n_dead_tup DESC;
A high count of n_dead_tup typically indicates extensive update or delete operation on a table. Excessive dead tuples can cause performance problems, as it may lead the query planner to make inaccurate estimations, potentially resulting in suboptimal plans. These dead tuples are cleaned up by the autovacuum daemon. Monitoring the last_autovacuum and last_autoanalyze timestamps helps in understanding when the autovacuum last operated on this table. This information is useful for adjusting autovacuum configurations and planning routine maintenance activities.
More from the Blog: Vacuum Best Practices
The pg_stat_all_tables view is indeed a valuable resource for PostgreSQL database administrators.
By effectively analyzing and interpreting the data from this view, admins can significantly improve the performance by identifying tables that require maintenance or removal for storage reclamation, as well as by fine-tuning query performance.