合 PG中的并行
简介
官网:http://postgres.cn/docs/13/parallel-query.html
PG从9.6开始支持并行查询,在10版本中,默认开启并行执行;在负载重的oltp系统上,建议关闭并行执行。
PostgreSQL能设计出利用多 CPU 让查询更快的查询计划。这种特性被称为并行查询。由于现有实现的限制或者因为没有比连续查询计划更快的查询计划存在,很多查询并不能从并行查询获益。不过,对于那些可以从并行查询获益的查询来说,并行查询带来的速度提升是显著的。很多查询在使用并行查询时比之前快了超过两倍,有些查询是以前的四倍甚至更多的倍数。那些访问大量数据但只返回其中少数行给用户的查询最能从并行查询中获益。
postgresql 11 对parallel又有了进一步的加强 。
- 并行创建btree索引(Parallel Index Builds)
- 使用共享hash table时可以并行执行hash join
- 单个选择如果不能并行化,则允许UNION并行运行每个SELECT
- 并行扫描分区表
- 允许 limit 传递给并行进程
- 允许并行进程使用索引扫描式减少返回结果
- 允许并行化单个计算查询、where子句聚合查询和目标列表中的函数
- 新加参数 parallel_leader_participation 控制执行计划中的领导者,默认启用。
- 并行执行CREATE TABLE … AS, CREATE MATERIALIZED VIEW, certain queries using UNION
- 并行hash join、并行顺序扫描在多并行进程下得到加强
- 在EXPLAIN中添加并行进程排序活动的报告
常用参数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | postgres=# select * postgres-# from pg_settings ps postgres-# where 1=1 postgres-# and ps.name in ( postgres(# 'force_parallel_mode', postgres(# 'max_worker_processes', postgres(# 'max_parallel_workers', postgres(# 'max_parallel_maintenance_workers', postgres(# 'max_parallel_workers_per_gather', postgres(# --'min_parallel_relation_size',-- add 9.6,remove from 10 postgres(# 'min_parallel_index_scan_size', postgres(# 'min_parallel_table_scan_size', postgres(# 'parallel_tuple_cost', postgres(# 'parallel_setup_cost', postgres(# 'parallel_leader_participation' postgres(# ) postgres-# ; name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart ----------------------------------+---------+------+----------------------------------------+----------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+------------+---------+---------+---------+--------------+------------------+----------+-----------+------------+------------+----------------- force_parallel_mode | off | | Query Tuning / Other Planner Options | Forces use of parallel query facilities. | If possible, run query using a parallel worker and with parallel restrictions. | user | enum | default | | | {off,on,regress} | off | off | | | f max_parallel_maintenance_workers | 2 | | Resource Usage / Asynchronous Behavior | Sets the maximum number of parallel processes per maintenance operation. | | user | integer | default | 0 | 1024 | | 2 | 2 | | | f max_parallel_workers | 8 | | Resource Usage / Asynchronous Behavior | Sets the maximum number of parallel workers that can be active at one time. | | user | integer | default | 0 | 1024 | | 8 | 8 | | | f max_parallel_workers_per_gather | 2 | | Resource Usage / Asynchronous Behavior | Sets the maximum number of parallel processes per executor node. | | user | integer | default | 0 | 1024 | | 2 | 2 | | | f max_worker_processes | 8 | | Resource Usage / Asynchronous Behavior | Maximum number of concurrent worker processes. | | postmaster | integer | default | 0 | 262143 | | 8 | 8 | | | f min_parallel_index_scan_size | 64 | 8kB | Query Tuning / Planner Cost Constants | Sets the minimum amount of index data for a parallel scan. | If the planner estimates that it will read a number of index pages too small to reach this limit, a parallel scan will not be considered. | user | integer | default | 0 | 715827882 | | 64 | 64 | | | f min_parallel_table_scan_size | 1024 | 8kB | Query Tuning / Planner Cost Constants | Sets the minimum amount of table data for a parallel scan. | If the planner estimates that it will read a number of table pages too small to reach this limit, a parallel scan will not be considered. | user | integer | default | 0 | 715827882 | | 1024 | 1024 | | | f parallel_leader_participation | on | | Resource Usage / Asynchronous Behavior | Controls whether Gather and Gather Merge also run subplans. | Should gather nodes also run subplans, or just gather tuples? | user | bool | default | | | | on | on | | | f parallel_setup_cost | 1000 | | Query Tuning / Planner Cost Constants | Sets the planner's estimate of the cost of starting up worker processes for parallel query. | | user | real | default | 0 | 1.79769e+308 | | 1000 | 1000 | | | f parallel_tuple_cost | 0.1 | | Query Tuning / Planner Cost Constants | Sets the planner's estimate of the cost of passing each tuple (row) from worker to master backend. | | user | real | default | 0 | 1.79769e+308 | | 0.1 | 0.1 | | | f (10 rows) |
- max_work_processes(integer)
这个设置是当前系统中支持的最大后台进程数,默认为8
如果备库设置,备库上此参数必须大于或等于主库上的此参数配置值
此参数调整,需要重启数据库生效
- max_parallel_workers
设置系统支持的并行查询进程数,默认8
这个参数值不能高于max_worker_processes
调整这个参数,建议同时调整max_parallel_workers_per_gather参数值
- max_parallel_workers_per_gather
设置允许启用的并行进程的进程数,默认2
设置成0,表示禁用并行进程。
设置单个Gather或Gather Merge节点能够启用的最大并行进程数,默认值为2,并行查询(Parallel Query)的并行度受此参数的影响,CREATE INDEX 命令的并行度不受此参数影响。
max_parallel_maintenance_workers
支持内建的过程使用并行的方式工作,例如建立索引,默认设置为2。
该参数设置维护命令(例如 CREATE INDEX) 命令允许的最大并行进程数,默认值为2。
- parallel_setup_cost(floating point)
设置优化器启动并行进程的成本,默认为1000
- parallel_tuple_cost(floating point)
设置优化器通过并行进程处理一行数据的成本,默认为0.1
- min_parallel_table_scan_size(integer)
设置开启并行的条件之一,表占用空间小于此值将不会开启并行
并行顺序扫描场景下扫描的数据大小通常等于表大小,默认值8M
- min_parallel_index_scan_size(integer)
设置开启并行的条件之一,实际上并行扫描不会扫描索引所有数据块
只是扫描索引相关数据块,默认值为512kb
- force_parallel_mode(enum)
强制开启并行,一般作为测试目的
其中的关系应该是
max_work_processes > max_parallel_workers > max_parallel_workers_per_gather >= max_parallel_maintenance_workers
以上参数可能并不容易理解,进一步解释如下:
- max_worker_processes 参数设置的是数据库允许的最大后台进程数,并行进程属于后台进程的一种;
- max_parallel_workers 参数设置数据库允许的最大并行进程数,这个值小于或等于 max_worker_processes。
- 并行进程数设置分为两类,第一类是并行查询,并行查询的并行度由 max_parallel_workers_per_gather 参数控制,第二类是维护命令(例如 CREATE INDEX),维护命令的并行度由 max_parallel_maintenance_workers 参数控制。
- max_parallel_workers_per_gather+max_parallel_maintenance_workers值应小于或等于 max_parallel_workers。
配置示例
1 2 3 4 5 6 7 8 | max_worker_processes = 16 max_parallel_workers_per_gather = 4 max_parallel_workers = 8 parallel_tuple_cost = 0.1 parallel_setup_cost = 1000.0 min_parallel_table_scan_size = 8MB min_parallel_index_scan_size = 512kB force_parallel_mode = off |
并行查询如何工作
当优化器判断对于某一个特定的查询,并行查询是最快的执行策略时,优化器将创建一个查询计划。该计划包括一个 Gather或者Gather Merge节点。下面是一个简单的例子:
1 2 3 4 5 6 7 8 | EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; QUERY PLAN ------------------------------------------------------------------------------------- Gather (cost=1000.00..217018.43 rows=1 width=97) Workers Planned: 2 -> Parallel Seq Scan on pgbench_accounts (cost=0.00..216018.33 rows=1 width=97) Filter: (filler ~~ '%x%'::text) (4 rows) |