pg_stat_statements is a built-in PostgreSQL extension that keeps track of SQL statements executed by the database server. It records details about query execution counts, total execution time and I/O related info. By capturing this information, database administrators can identify bottlenecks and optimize queries for better overall performance. In this blog, we’ll take a look at how pg_stat_statements module can be utilized to enhance the performance of your PostgreSQL database.
Enable pg_stat_statements for PostgreSQL
pg_stat_statements is included in with standard Postgres, but might not be automatically enabled. Run the following command as superuser:
create extension pg_stat_statements;
The module must be loaded by adding pg_stat_statements to shared_preload_libraries in postgresql.conf, because it requires additional shared memory.
Now add pg_stat_statment to shared_preload_libraries parameter and restart postgresql server:
shared_preload_libraries = 'pg_stat_statements'
To verify if the extension is working correctly, we can run the following command:
SELECT * from pg_stat_statements;
In order to reset the contents of pg_stat_statements view, run the following function:
SELECT pg_stat_statements_reset();
Note: Only admin users and roles with the pg_read_all_stats privilege are permitted to view the SQL text and queryid of queries inside the pg_stat_statement view.
Configuration Parameters
The following parameters can be modified inside postgresql.conf.
Column | Datatype | Default Value | Description |
pg_stat_statements.max | integer | 5000 | Maximum number of statements tracked by the module. Information about the least-executed statements is discarded when the threshold is reached. |
pg_stat_statements.track | enum | top | Controls which statements the module tracks. Valid values are top (track statements issued directly by clients), all (track top-level and nested statements), and none (disable statement statistics collection). |
pg_stat_statements.track_utility | boolean | on | Enables tracking of utility commands. https://pgpedia.info/categories/utility-command.html |
pg_stat_statements.track_planning | boolean | off | Controls whether planning operations and duration are tracked by the module. Enabling this parameter may incur a noticeable performance penalty. |
pg_stat_statements.save | boolean | on | Specifies whether to save statement statistics across server shutdowns. |
Key columns
The following table provides a description of the key columns in the pg_stat_statements view.
Column | Datatype | Description |
userid | oid | OID of user who executed the statement |
dbid | oid | OID of database in which the statement was executed |
queryid | bigint | Internal hash code, computed from the statement’s parse tree |
query | text | Text of a representative statement |
calls | bigint | Number of times executed |
total_exec_time | double precision | Total time spent in the statement, in milliseconds |
min_exec_time | double precision | Minimum time spent in the statement, in milliseconds |
max_exec_time | double precision | Maximum time spent in the statement, in milliseconds |
mean_exec_time | double precision | Mean time spent in the statement, in milliseconds |
stddev_exec_time | double precision | It shows the execution time deviation for this statement. |
rows | bigint | Total number of rows retrieved or affected by the statement |
shared_blks_hit | bigint | The amount of data read from the shared buffer cache |
shared_blks_read | bigint | The amount of data read from the disk |
shared_blks_dirtied | bigint | The amount of data written, as the number of blocks touched. |
shared_blks_written | bigint | The amount of data that had to be written to the disk synchronously because of cache pressure |
blk_read_time | double precision | Total time the statement spent reading blocks, in milliseconds (if track_io_timing is enabled, otherwise zero) |
blk_write_time | double precision | Total time the statement spent writing blocks, in milliseconds (if track_io_timing is enabled, otherwise zero) |
wal_records | bigint | Total number of WAL records generated by the statement |
wal_fpi | bigint | Total number of WAL full page images generated by the statement |
wal_bytes | numeric | Total amount of WAL generated by the statement in bytes |
local_blks_hit, local_blks_read, local_blks_dirtied and local_blks_written shows data activity on temporary tables and indexes.
temp_blk_read and temp_blk_written refer to temporary files (as in files used e.g. for external sorts, hashes, Materialize plan nodes, and similar cases).
Find Long Running Queries
The average total duration of each query is one of the simplest ways to identify slow Postgres queries that require your attention. Since the data is cumulative, this is not a time-weighted average, but it still aids in framing an appropriate context for where to begin.
SELECT userid :: regrole, dbid, mean_exec_time / 1000 as mean_exec_time_secs, max_exec_time / 1000 as max_exec_time_secs, min_exec_time / 1000 as min_exec_time_secs, stddev_exec_time, calls, query from pg_stat_statements order by mean_exec_time DESC; userid | dbid | mean_exec_time_secs | max_exec_time_secs | min_exec_time_secs | stddev_exec_time | calls | rows | query ----------+-------+---------------------+--------------------+----------------------+--------------------+-------+----------+---------------------------------------------------- Postgres | 16402 | 11.19041728359315 | 71.79477529 | 0.00580726 | 1628.6522214021536 | 10945 | 83922491 | SELECT min("account_move_line".id) AS id, count("a Postgres | 16402 | 8.441143214310346 | 8.662522655 | 8.060434353 | 133.49689719844346 | 29 | 292120 | SELECT max("account_move_line".id) AS id, count("a Postgres | 16402 | 6.374394403519453 | 9.831685138 | 6.120206754 | 226.79499038949734 | 797 | 797 | DELETE FROM account_move WHERE id IN (23750178) Postgres | 16402 | 6.331272233807015 | 6.623289872 | 6.142401752 | 100.66318493612958 | 57 | 57 | SELECT min(".id) AS id, count("a
The first query has been called 10945 times and the average execution time is around 11 seconds. We can also evaluate stddev_exec_time column which defines how widely the time each query execution takes compared to the overall mean. If the value is small, then all queries take a similar amount of time to execute. If the value is too large, this indicates that the execution time of the query is variable and we need to check I/O consumption of that query.
We can calculate coefficient of variation by this formula:
coeff_of_variance = stddev_exec_time/mean_exec_time
If this number is near to 0, this means every time the statement is executed, it takes roughly the same amount of time to complete or vice versa.
Find I/O Intensive Queries
If PostgreSQL is reading data from disk, it indicates that other operations and data are taking precedence and “pushing” the data your query needs back out to disk each time. To run the following query, make sure track_io_timing GUC is enabled inside postgresql.conf
SELECT mean_exec_time / 1000 as mean_exec_time_secs, calls, rows, shared_blks_hit, shared_blks_read, shared_blks_hit /(shared_blks_hit + shared_blks_read):: NUMERIC * 100 as hit_ratio, (blk_read_time + blk_write_time)/calls as average_io_time_ms, query FROM pg_stat_statements where shared_blks_hit > 0 ORDER BY (blk_read_time + blk_write_time)/calls DESC; mean_exec_time_secs | calls | rows | shared_blks_hit | shared_blks_read | hit_ratio | average_io_time_ms | query ---------------------+-------+--------+-----------------+------------------+-------------------------+--------------------+----------------------------------------------------------------------------- 47.15276140405882 | 17 | 15063 | 102154092 | 16124529 | 86.36733429619542148700 | 1987.1274318235292 | SELECT account_move_line.account_id as groupby where a=$1 15.67629799178571 | 28 | 84 | 91257431 | 17777969 | 83.69523200721967361100 | 1268.3345261785714 | SELECT account_move_line.account_id as id where b.id = $1 left 8.260415101000001 | 61 | 61 | 702231111 | 23702045 | 96.73495489163192320100 | 776.7368886229513 | SELECT min("account_move_line".id) AS id, count("account_move_line".id) AS 6.331272233807015 | 57 | 57 | 129728949 | 22799900 | 85.05207365722664045000 | 760.4856407017546 | SELECT min("account_move_line".id) AS id, count("account_move_line".id) AS 8.441143214310346 | 29 | 292120 | 333631668 | 11586119 | 96.64382328017182961700 | 750.1903752413795 | SELECT min("account_move_line".id) AS id, count("account_move_line".id) AS
By analyzing first query, we can see that roughly 14 % of the time, PostgreSQL has to retrieve data from disk to satisfy the query. We can also check which query spent the most time reading from disk on average using average_io_time_ms column. Look out for missing indexes and partition tables which can help to avoid large sequential scans.
Find Temporary I/O Intensive Queries
select userid::regrole, dbid, mean_exec_time / 1000 as mean_exec_time_secs, calls, rows, temp_blks_read, temp_blks_written, query from pg_stat_statements order by temp_blks_read + temp_blks_written / calls DESC; userid | dbid | mean_exec_time_secs | calls | rows | temp_blks_read | temp_blks_written | query --------------------------+-------+------------------------+---------+-----------+----------------+-------------------+---------------------------------------------------- postgres | 16402 | 26.774439807683848 | 5201 | 2685972 | 1509041060 | 1520966260 | SELECT account_move_line.accou odoouser | 16402 | 11.193412404985823 | 8527 | 84698343 | 255130928 | 255343779 | SELECT min("account_move_line".id) AS id, count("a odoouser | 16402 | 7.2981765913771435 | 17622 | 17622 | 251619630 | 258266960 | SELECT min("account_move_line".id) AS id, count("a odoouser | 16402 | 91.93665896475237 | 525 | 43722 | 243110227 | 459557199 | SELECT max("account_move_line".id) AS id, count("a odoouser | 16402 | 27.09240567426149 | 784 | 414980 | 230130830 | 231604980 | SELECT account_move_line.accou
If you are looking for good performance, it makes sense to consider temporary I/O as a potential factor. temp_blks_read and temp_blks_written are the important parameters which are populated in case of external sorts, hashes, Materialize plan node operations. For higher temp_* values, try optimizing work_mem parameter and also check for missing indexes.
temp_blks_read_time and temp_blks_read_time columns are now added in PostgreSQL 15.
Conclusion
PostgreSQL users can benefit greatly from the use of pg_stat_statement. It offers thorough analysis of query that may be applied to boost database and application performance. For instance, an expensive query which is rarely executed might not be worth the effort to tune right now. Instead, a moderately slow query which is frequently called will probably a better candidate for optimization. If you want to find out more about performance, we have tons of other useful tips for you. You can check out our post about PostgreSQL Memory Allocation Best Practices and Understand PostgreSQL Parallel Query