Understanding PostgreSQL Parallel Query

Performance Tuning

A 4-8 core system for modest applications is a decent starting point. Several hundred connections can be made with this before it becomes overloaded. Increasing the number of cores will help the system scale as the workload grows.

Q2: How parallel query works in PostgreSQL?

When PostgreSQL planner determines that parallel query is the fastest execution strategy for a statement, it will divide it into smaller parts that can be executed simultaneously by multiple background workers, improving the overall query performance. The workers communicate with each other using shared memory, and once workers have completed their work, the results are passed on to the leader process for accumulation. After running explain, you can see that it will create a query plan that includes a Gather or Gather Merge node. for e.g:

Gather (cost=0.00..46010.00 rows=1000000 width=43)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on employees (cost=0.00..44010.00 rows=250000 width=43)

The workers planned value indicates the number of processes that PostgreSQL plans to use for executing a particular part of the query, based on the estimated cost of the operation and the available system resources. The number of background workers that the planner will consider using is limited to at most “max_parallel_workers_per_gather“.

Q3: What is the difference between Gather and Gather merge node?

When the optimizer is using Gather Merge node, it indicates that each process executing the parallel portion of the plan is producing tuples in sorted order, and then the leader process does a final merge over those records. It is mostly used in queries with ORDER BY clause. In Gather node, no sorting is involved and leader process reads tuples in whatever order provided by the parallel portion.

Q4: When is query level parallelism used in PostgreSQL?

If you’re running lots of concurrent, fast, and low latency queries which use indexes heavily and selects very small data sets on Postgres, parallelism won’t be needed at all.

However, if you run some longer, analytical queries which include large table scans, aggregation, and lengthy joins, then we can get the benefit of query parallelism and make use of hardware efficiently.

Q5: Which PostgreSQL parameters contribute to query level parallelism?

max_worker_processes

Controls the total number of worker processes available systemwide. It is mostly set to the number of cores available for the PostgreSQL instance.

max_parallel_workers

This parameter sets the maximum number of parallel workers that can be active at the same time in the whole database cluster for multiple queries. It must be less or equal to max_worker_processes.

max_parallel_workers_per_gather

This parameter limits the number of parallel worker processes that can be used for a single query.

for example:

we’ve set the following parameters:

max_parallel_workers = 8
max_parallel_workers_per_gather = 2

Runs the following query on a table with 600000 rows

explain analyze
select *
from tbl
where value != -1

Results:
Gather (cost=1000.00 .. 1136714.86 rows=580941 width=78 actual time=0.495..3057.813 rows = 587886 loops=1)
workers planned: 2
workers launched: 2
-> parallel seq scan on tbl (cost=0.00..10776.76 rows=242059 width=718) (actual time=0.095..2968.77 rows=195962 loops=3)
filter: (value<>-1::integer)
rows removed by filter: 5389091
plan time: 0.175ms
execution time: 3086.243ms

In the above example, only two workers were launched because of max_parallel_workers_per_gather is set to 2.

parallel_setup_cost

Represents the cost of launching up parallel workers and memory allocation. The default value is 1000

parallel_tuple_cost

Represents the cost of transferring data between parallel workers. The default value is 0.1

min_parallel_table_scan_size

Represents the minimum size of a table in bytes that must be scanned before a parallel table scan is considered. The default value is 8MB. Table size less than this value will not be considered for a parallel scan.

min_parallel_index_scan_size

Represents the minimum size of an index in bytes that must be scanned before a parallel index scan is considered. The default value is 8MB.

Q6: In the above example, when 2 parallel workers are launched, why loop section shows 3?

When operations are done in parallel, PostgreSQL will report the number of processes as loops. we can see that there were 3 threads (loops): 2 worker nodes i.e.Worker 0 and Worker 1, plus the main leader node (Also considered as worker) which helps in appending or merging the resultset.

Q7: What is the relation between work_mem and max_parallel_workers?

The relation between work_mem and max_parallel_workers is that the value of work_mem should be set to accommodate the memory requirements of each individual worker process, but not so high that it consumes too much memory overall and limits the number of worker processes that can be used.

For example, if a query spawns 4 worker processes for parallel execution, and work_mem is set to 32MB, then each worker process will be allocated 32MB of memory for each sort/hash operation. If you don’t have enough memory to accommodate the above example, lower the value of max_parallel_workers or work_mem to avoid resource contention.

Summary

1: If you have more running processes than CPU cores, your CPU is overloaded. Similarly, if processes are doing more I/O than the disk can handle, performance will suffer.

2: max_worker_processes is the cluster-wide limit for the number of custom background workers and parallel workers where custom background tasks can include vacuuming/analyzing/backups etc.

3: A parallel plan for the query isn’t generated if a query contains a data-modifying operation. For example, if your query is a delete of an update statement and not a select statement, the query plan for parallel query isn’t generated. For more information, follow this link:
https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html

4: To force parallel query execution, decrease the value of parallel_setup_cost or parallel_tuple_cost. Make sure max_parallel_workers_per_gather is greater than 0.

5: Always monitor CPU load using top/htop or tools such as Zabbix or Grafana for resource consumption. Excessive use of parallel queries can create system wide contention.

6: For understanding how parallel scans/joins/aggregates are performed on page level, follow this link
https://www.postgresql.org/docs/current/parallel-plans.html