PostgreSQL Internals Part 2: Understanding Page Structure

Welcome to the second part of our PostgreSQL internals blog series. In the first part, we delved into database clusters and their physical structure.

In part two, we’ll explore the internal structure of a page in PostgreSQL. If you still need to check out part one, you can find it here.

PostgreSQL Page Layout Overview

When a table is created, a corresponding data file is generated. Within this file, data is organized into fixed-length pages, typically set at 8KB by default. Each page is sequentially numbered starting from 0, known as block numbers. PostgreSQL appends a new empty page at the end of a file when it reaches capacity, That is how PostgreSQL increases the size of a data file.

While the default size of a page is 8KB we can change it. When compiling PostgreSQL from source, you can adjust the page size using the –with-blocksize=BLOCKSIZE option. The BLOCKSIZE value should be a power of 2 between 1 and 32 kilobytes. If you modify the block size from the default value during compilation, the existing data stored on disk becomes incompatible with a PostgreSQL server built with a different block size. Since changing the block size alters how data is stored, pg_upgrade won’t be able to understand the format of the data on disk if you’ve compiled a new server with a different block size.

This effectively prevents you from using pg_upgrade to perform a smooth upgrade to the new server version.

Source: medium

The above diagram represents the internal layout of a page in the data file. Here are the important components of a page.

PostgreSQL Page Header

A header is 24 bytes long and contains metadata information about the page. Let’s deep dive into this…

Source: medium

The diagram above illustrates the breakdown of the 24-byte header into its constituent parts.

pd_lsn

This is an 8-byte number AKA a log sequence number that stores the LSN of the last change made to that page. This information is crucial for ensuring data consistency and recovery.

By comparing pd_lsn with the WAL, PostgreSQL can determine whether a page needs to be updated or recovered during crash recovery or replication processes.

pd_checksum

pd_checksum is a 2-byte flag stored on the page. Checksums are a form of error detection used to ensure data integrity stored on disk. When pd_checksum is set, it means that a checksum is calculated and stored alongside the page data. During read operations, PostgreSQL can verify the integrity of the page by recalculating the checksum and comparing it with the stored value. If they don’t match, it indicates a potential data corruption issue.

In PostgreSQL version 12, a new built-in tool called pg_checksums was introduced. This tool enables checksums on existing databases that were previously created without checksums, Before this version, users had to reinitialize the cluster using the –data-checksums or -k flag during the initdb process to enable checksums.

pd_flags

This 2-byte field holds a set of flags for page management, offering hints about the page’s status and its occupancy level.

PD_HAS_FREE_LINES This flag suggests there might be some empty space on the page that could be used to store new data. However, it’s not a guarantee. This information isn’t written in the transaction log (WAL), so it might not always be completely accurate.

PD_PAGE_FULL This flag indicates that the page is likely full, and there might not be enough space to update existing data on the page. When this happens, PostgreSQL might need to do some cleanup (pruning) to free up space for the update.

pd_lower and pd_upper pointers

The lower pointer marks the endpoint of line pointers, while the upper pointer denotes the start of the most recent heap tuple. With the help of pd_lower and pd_upper pointers PostgreSQL determines the available free space in a page.

pd_special

It’s a 2-byte integer serving as a pointer, marking the starting byte location of the special section within a page. Not all data types utilize this section; it’s mainly employed by certain index types such as B-tree indexes to store structure-specific information. For example, in B-tree indexes, the special section might contain links to the page’s left and right sibling pages and other metadata pertinent to the index structure.

These links help optimize search operations by providing quick access to neighboring pages. Regular tables, which store actual data rows, usually don’t need a special section and utilize the entire page for storing data.

pd_pagesize_version

This 2 bytes field is used to store the page size and a version indicator number Starting from PostgreSQL 8.3 the version number is 4 and the default page size is 8kb

pd_prune_xid

This is the last field of 4 bytes that gives a clue about whether it’s worth cleaning up the page: it keeps track of the oldest transaction ID (XID) that hasn’t been cleaned up yet. If this value is relatively recent, it suggests that there may still be active transactions referencing data on the page. In this case, pruning might not be beneficial as it could potentially remove data still needed by active transactions.

Conversely, if the “oldest un-pruned XMAX” value is sufficiently old, it implies that there are no longer any active transactions referencing the data on the page. Pruning becomes more likely to be profitable in this scenario, as the data marked for deletion can be safely removed without affecting ongoing transactions.

Line pointers and Tuples

AKA item pointers i.e., l1, l2, and l3 a 4-byte long that serves as a reference to each heap tuple within PostgreSQL. These pointers collectively form an array akin to an index for the tuples, each assigned a sequential offset number starting from 1. Whenever a new tuple(T1, T2, and T3) is inserted into the page, a corresponding line pointer is appended to the array to reference the new tuple. Line pointers grow from top to bottom and left to right. Tuples grow from bottom to top and right to left.

PostgreSQL Fill Factor Optimization

Each page in PostgreSQL has a fill factor, usually set to 100% by default. This means PostgreSQL tries to fill each page completely with data until the lower and upper pointers meet. When they meet, a new page is created. This continues until each 8-kilobyte block is full, leading to the creation of more blocks and expanding the table.

The fill factor can be set with the following query for an existing table

ALTER TABLE <TABLE_NAME> SET (FILLFACTOR = 50);

Or if you are creating a new table

CREATE TABLE <TABLE_NAME> ( COL1 INTEGER, COL2 TEXT) WITH (FILLFACTOR = 50);

For the scenario where you often update the data more frequently. We ca improve the performance of the UPDATE queries by setting fill factor to 60 % and leaving rest of the 40% for future use by UPDATE queries with fill factor 100% When a row is updated, PostgreSQL might need to relocate the updated data if the existing page is full. With a 60% fillfactor, there’s a higher chance of having enough space on the same page to accommodate the update, avoiding the overhead of moving data to a different page.

Source: medium

This approach can help optimize performance for UPDATE operations over time. However, this approach comes with trade-off Setting the fill factor to 60% essentially reserves half the space on each page for future updates. This can lead to increased storage requirements compared to the default 100% fill factor, where all available space is initially utilized.

Also while updates might benefit, other operations like inserts or scans might see a performance decline due to the wasted space. With less data stored per page, more I/O operations might be required to retrieve the same amount of data.

How Many Pages Does My Table Have?

Replace <your_table_name> with the actual name of your table. This query retrieves the table name and the relpages value, which represents the estimated number of pages currently allocated to the table.

SELECT relname AS table_name, relpages AS table_pages FROM pg_class WHERE relname = 'test'; 
table_name | table_pages 
------------+------------- 
test       |      442478 
(1 row)

NOTE: As the page number starts with 0 therefore there are a total number of 442477 pages available

The query returns 0 for the number of pages, despite the table containing data.

it could be due to a delay in updating the statistics. To address this discrepancy, you can try running a VACUUM ANALYZE command on the table to update its statistics

postgres=# VACUUM ANALYZE test; 
VACUUM

And re-run the above query…

How To Check the Header Information of a Page?

postgres=# CREATE EXTENSION pageinspect;
CREATE EXTENSION

postgres=# SELECT * FROM page_header(get_raw_page(‘test’,0));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
————+———-+——-+——-+——-+———+———-+———+———–
0/16FC9DB0 | 0 | 4 | 928 | 960 | 8192 | 8192 | 4 | 0
(1 row)

Where the test is the table name and 0 is the page number we wanted to examine.

Stay tuned for the next part where we will deep dive into the structure of tuples in a table.