Understand Table Statistics Using pg_stat_all_tables

Database monitoring, performance tuning and query optimization are critical operations for maintaining an efficient database system. A key component in PostgreSQL for this purpose is the pg_stat_all_tables view.

pg_stat_all_tables offers real time statistics on table activities such as number of sequential and index scans performed on a table, updates, deletes, inserts, and more. It also offers information on dead tuples along with vacuum and analyze stats which enables DB admins to make data-driven decisions. Here’s a table detailing the columns in the pg_stat_all_tables:

Column Name	Description
relid	The Object Identifier (OID) of the table.
schemaname	The name of the schema that contains the table.
relname	The name of the table.
seq_scan	Total no of sequential scans count on this table
last_seq_scan(Introduced in PG16)	The time of the last sequential scan on this table, based on the most recent transaction stop time.
seq_tup_read	The number of tuples read by sequential scans.
idx_scan	The number of index scans initiated on this table.
last_idx_scan(Introduced in PG16)	The time of the last index scan on this table, based on the most recent transaction stop time. It doesn’t provide information of which index was used during the latest scan.
idx_tup_fetch	The number of tuples fetched by index scans.
n_tup_ins	The number of tuples inserted into the table.
n_tup_upd	The number of tuples updated in the table.
n_tup_del	The number of tuples deleted from the table.
n_tup_hot_upd	The number of tuples ‘HOT’ updated (Heap-Only Tuples).
n_live_tup	The estimated number of live tuples in the table.
n_dead_tup	The estimated number of dead tuples in the table.
n_mod_since_analyze	The number of tuples modified since the last analyze operation.
n_ins_since_vacuum	Estimated number of rows inserted since this table was last vacuumed
last_vacuum	The timestamp of the last vacuum operation performed on this table.
last_autovacuum	The timestamp of the last automatic vacuum operation performed on this table.
last_analyze	The timestamp of the last analyze operation performed on this table.
last_autoanalyze	The timestamp of the last automatic analyze operation performed on this table.
vacuum_count	The number of times this table has been manually vacuumed.
autovacuum_count	The number of times this table has been auto-vacuumed.
analyze_count	The number of times this table has been manually analyzed.
autoanalyze_count	The number of times this table has been auto-analyzed.

For more detailed information, you can refer to PostgreSQL’s official documentation on monitoring statistics views: Monitoring Stats Views.

1: How to identify tables with the highest frequency of sequential scans in a PostgreSQL database?

SELECT

schemaname,

relname,

Seq_scan,

idx_scan

seq_tup_read,

seq_tup_read / seq_scan as avg_seq_read

FROM

pg_stat_all_tables

WHERE

seq_scan > 0

AND

schemaname not in (‘pg_catalog’,’information_schema’)

ORDER BY

Avg_seq_read DESC

LIMIT 10;

This query will list the top 10 tables based on the average number of tuples read in sequential scans (avg_seq_read). We can also change the orderby clause to seq_scan parameter. Observing a high number of sequential scans with a low count of index scans on a table could indicate that the table may benefit from indexing, especially if the queries executed on this table frequently use WHERE clauses.

2: How to identify unused or infrequently accessed tables in postgresql?

SELECT

schemaname,

relname,

seq_scan,

Idx_scan,

(COALESCE(seq_scan, 0) + COALESCE(idx_scan, 0)) as total_scans_performed

FROM

pg_stat_all_tables

WHERE

(COALESCE(seq_scan, 0) + COALESCE(idx_scan, 0)) < 10

AND schemaname not in (‘pg_catalog’, ‘information_schema’)

ORDER BY

5 DESC;

This query will identify tables that have a total scan count of less than 10(threshold). In PostgreSQL v16, the addition of the last_seq_scan and last_idx_scan columns enables us to determine the Last Access Time of the tables.

More from the Blog: Database Concurrency: Two phase Locking (2PL) to MVCC – Part 1

3: How to check the write activity of tables in PostgreSQL?

SELECT

st.schemaname,st.relname,

pg_size_pretty(pg_total_relation_size(st.relid)) as Total_Size,

st.seq_scan,

st.idx_scan,

st.n_tup_ins,

st.n_tup_upd,

st.n_tup_del,

st.n_tup_hot_upd,

st.n_tup_hot_upd * 100 / (case when st.n_tup_upd > 0 then st.n_tup_upd else 1 end) as hot_percentage

from pg_stat_all_tables st

WHERE st.schemaname not in (‘pg_catalog’,’information_schema’)

order by Total_Size DESC;

This data helps us grasp the DML activity on the table. The n_tup_upd and n_tup_hot_upd columns in the view indicate the total counts of regular and HOT(Heap-Only Tuple) updates for each table. It’s important to focus on tables with a low hot_percentage and a high frequency of write operations indicated by n_tup_ins, n_tup_upd. Regular monitoring of these statistics is beneficial in order to understand the write pattern.

4: How to determine the number of live and dead tuples in a table and check their vacuum status?

select

schemaname,

relname,

n_live_tup,

n_dead_tup,

n_dead_tup * 100 / (case when n_live_tup > 0 then n_live_tup else 1 end) as dead_rows_percent,

last_autovacuum,

last_autoanalyze,

n_dead_tup,

relname

from

pg_stat_all_tables

WHERE

schemaname not in (‘pg_catalog’,’information_schema’)

ORDER BY

n_dead_tup DESC;

A high count of n_dead_tup typically indicates extensive update or delete operation on a table. Excessive dead tuples can cause performance problems, as it may lead the query planner to make inaccurate estimations, potentially resulting in suboptimal plans. These dead tuples are cleaned up by the autovacuum daemon. Monitoring the last_autovacuum and last_autoanalyze timestamps helps in understanding when the autovacuum last operated on this table. This information is useful for adjusting autovacuum configurations and planning routine maintenance activities.

More from the Blog: Vacuum Best Practices

The pg_stat_all_tables view is indeed a valuable resource for PostgreSQL database administrators.

By effectively analyzing and interpreting the data from this view, admins can significantly improve the performance by identifying tables that require maintenance or removal for storage reclamation, as well as by fine-tuning query performance.