合 PostgreSQL高可用之repmgr(1主2从+1witness)+Pgpool-II实现主从切换+读写分离+负载均衡
Tags: PG高可用Pgpool-IIpgpoolAdmin读写分离负载均衡repmgrwitness主从切换主从流复制
五、检查repmgr集群
1 2 3 4 5 | repmgr -f /pg13/pg13/repmgr.conf cluster matrix repmgr -f /pg13/pg13/repmgr.conf cluster crosscheck repmgr -f /pg13/pg13/repmgr.conf node status repmgr -f /pg13/pg13/repmgr.conf node check |
一些结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | [pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster matrix INFO: connecting to database Name | ID | 1 | 2 | 3 | 4 ----------------+----+---+---+---+--- lhrrepmgr64361 | 1 | * | * | * | * lhrrepmgr64362 | 2 | * | * | * | * lhrrepmgr64363 | 3 | * | * | * | * lhrrepmgr64364 | 4 | * | * | * | * [pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf node status Node "lhrrepmgr64361": PostgreSQL version: 13.2 Total data size: 37 MB Conninfo: host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2 Role: primary WAL archiving: enabled Archive command: test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f WALs pending archiving: 0 pending files Replication connections: 2 (of maximal 10) Replication slots: 0 physical (of maximal 10; 0 missing) Replication lag: n/a [pg13@lhrrepmgr64361 ~]$ [pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf node check Node "lhrrepmgr64361": Server role: OK (node is primary) Replication lag: OK (N/A - node is primary) WAL archiving: OK (0 pending archive ready files) Upstream connection: OK (N/A - node is primary) Downstream servers: OK (2 of 2 downstream nodes attached) Replication slots: OK (node has no physical replication slots) Missing physical replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured "data_directory" is "/pg13/pgdata") [pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf node status Node "lhrrepmgr64362": PostgreSQL version: 13.2 Total data size: 37 MB Conninfo: host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2 Role: standby WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective) Archive command: test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f WALs pending archiving: 0 pending files Replication connections: 0 (of maximal 10) Replication slots: 0 physical (of maximal 10; 0 missing) Upstream node: lhrrepmgr64361 (ID: 1) Replication lag: 0 seconds Last received LSN: 0/8001480 Last replayed LSN: 0/8001480 [pg13@lhrrepmgr64364 ~]$ repmgr -f /pg13/pg13/repmgr.conf node status Node "lhrrepmgr64364": PostgreSQL version: 13.2 Total data size: 29 MB Conninfo: host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2 Role: witness WAL archiving: enabled Archive command: test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f WALs pending archiving: 0 pending files Replication connections: 0 (of maximal 10) Replication slots: 0 physical (of maximal 10; 0 missing) Replication lag: n/a |
六、主从切换
6.1、switchover正常主从切换
switchover切换即主变备,备变主。
官网:https://repmgr.org/docs/current/repmgr-standby-switchover.html
6.1.1、把备库62变为主库
在62上操作:
1 2 3 4 5 6 7 8 | repmgr -f /pg13/pg13/repmgr.conf cluster show repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind repmgr -f /pg13/pg13/repmgr.conf cluster show -- 可以debug打印详细的切换过程 repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind --log-level DEBUG --verbose |
--siblings-follow 表示所有的从库同步源自动改成最新的主库节点
--force-rewind 如果repmgr检测到需要执行pg_rewind(同步)的时候,在执行pg_rewind之前,在新主节点执行checkpoint
执行过程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | [pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+----------------+---------+-----------+----------------+----------+----------+----------+--------------------------------------------------------------------------- 1 | lhrrepmgr64361 | primary | * running | | default | 100 | 3 | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2 2 | lhrrepmgr64362 | standby | running | lhrrepmgr64361 | default | 100 | 3 | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2 3 | lhrrepmgr64363 | standby | running | lhrrepmgr64361 | default | 100 | 3 | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2 4 | lhrrepmgr64364 | witness | * running | lhrrepmgr64361 | default | 0 | n/a | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2 [pg13@lhrrepmgr64362 ~]$ [pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind NOTICE: checking switchover on node "lhrrepmgr64362" (ID: 2) in --dry-run mode INFO: prerequisites for using pg_rewind are met INFO: SSH connection to host "172.72.6.61" succeeded INFO: able to execute "repmgr" on remote host "172.72.6.61" INFO: all sibling nodes are reachable via SSH INFO: 3 walsenders required, 10 available INFO: demotion candidate is able to make replication connection to promotion candidate INFO: 0 pending archive files INFO: replication lag on this standby is 0 seconds NOTICE: local node "lhrrepmgr64362" (ID: 2) would be promoted to primary; current primary "lhrrepmgr64361" (ID: 1) would be demoted to standby INFO: following shutdown command would be run on node "lhrrepmgr64361": "/pg13/pg13/bin/pg_ctl -D '/pg13/pgdata' -W -m fast stop" INFO: parameter "shutdown_check_timeout" is set to 60 seconds INFO: prerequisites for executing STANDBY SWITCHOVER are met [pg13@lhrrepmgr64362 ~]$ [pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind NOTICE: executing switchover on node "lhrrepmgr64362" (ID: 2) NOTICE: local node "lhrrepmgr64362" (ID: 2) will be promoted to primary; current primary "lhrrepmgr64361" (ID: 1) will be demoted to standby NOTICE: stopping current primary node "lhrrepmgr64361" (ID: 1) NOTICE: issuing CHECKPOINT on node "lhrrepmgr64361" (ID: 1) DETAIL: executing server command "/pg13/pg13/bin/pg_ctl -D '/pg13/pgdata' -W -m fast stop" INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout") INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout") NOTICE: current primary has been cleanly shut down at location 0/9000028 NOTICE: promoting standby to primary DETAIL: promoting server "lhrrepmgr64362" (ID: 2) using pg_promote() NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete NOTICE: STANDBY PROMOTE successful DETAIL: server "lhrrepmgr64362" (ID: 2) was successfully promoted to primary NOTICE: issuing CHECKPOINT on node "lhrrepmgr64362" (ID: 2) ERROR: unable to execute CHECKPOINT INFO: local node 1 can attach to rejoin target node 2 DETAIL: local node's recovery point: 0/9000028; rejoin target node's fork point: 0/90000A0 NOTICE: setting node 1 upstream to node 2 WARNING: unable to ping "host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2" DETAIL: PQping() returned "PQPING_NO_RESPONSE" NOTICE: starting server using "/pg13/pg13/bin/pg_ctl -w -D '/pg13/pgdata' start" NOTICE: NODE REJOIN successful DETAIL: node 1 is now attached to node 2 NOTICE: node "lhrrepmgr64362" (ID: 2) promoted to primary, node "lhrrepmgr64361" (ID: 1) demoted to standby NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes NOTICE: switchover was successful DETAIL: node "lhrrepmgr64362" is now primary and node "lhrrepmgr64361" is attached as standby NOTICE: STANDBY SWITCHOVER has completed successfully [pg13@lhrrepmgr64362 ~]$ [pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+----------------+---------+-----------+----------------+----------+----------+----------+--------------------------------------------------------------------------- 1 | lhrrepmgr64361 | standby | running | lhrrepmgr64362 | default | 100 | 3 | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2 2 | lhrrepmgr64362 | primary | * running | | default | 100 | 4 | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2 3 | lhrrepmgr64363 | standby | running | lhrrepmgr64362 | default | 100 | 3 | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2 4 | lhrrepmgr64364 | witness | * running | lhrrepmgr64362 | default | 0 | n/a | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2 C:\Users\lhrxxt>psql -U repmgr -h192.168.66.35 -p64361 -d repmgr Password for user repmgr: psql (13.2) Type "help" for help. repmgr=# select * from nodes; node_id | upstream_node_id | active | node_name | type | location | priority | conninfo | repluser | slot_name | config_file ---------+------------------+--------+----------------+---------+----------+----------+---------------------------------------------------------------------------+----------+-----------+------------------------ 2 | | t | lhrrepmgr64362 | primary | default | 100 | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2 | repmgr | | /pg13/pg13/repmgr.conf 1 | 2 | t | lhrrepmgr64361 | standby | default | 100 | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2 | repmgr | | /pg13/pg13/repmgr.conf 3 | 2 | t | lhrrepmgr64363 | standby | default | 100 | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2 | repmgr | | /pg13/pg13/repmgr.conf 4 | 2 | t | lhrrepmgr64364 | witness | default | 0 | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2 | repmgr | | /pg13/pg13/repmgr.conf (4 rows) |
可以看到,主库变为lhrrepmgr64362,而lhrrepmgr64361变为了从库。
6.1.2、把新备库61切换为主库
1 2 3 4 5 6 7 8 | -- 在61上做操作 repmgr -f /pg13/pg13/repmgr.conf cluster show repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind -- 可以debug打印详细的切换过程 repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind --log-level DEBUG --verbose |
执行过程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | [pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+----------------+---------+-----------+----------------+----------+----------+----------+--------------------------------------------------------------------------- 1 | lhrrepmgr64361 | standby | running | lhrrepmgr64362 | default | 100 | 4 | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2 2 | lhrrepmgr64362 | primary | * running | | default | 100 | 4 | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2 3 | lhrrepmgr64363 | standby | running | lhrrepmgr64362 | default | 100 | 4 | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2 4 | lhrrepmgr64364 | witness | * running | lhrrepmgr64362 | default | 0 | n/a | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2 [pg13@lhrrepmgr64361 ~]$ [pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind NOTICE: checking switchover on node "lhrrepmgr64361" (ID: 1) in --dry-run mode INFO: prerequisites for using pg_rewind are met INFO: SSH connection to host "172.72.6.62" succeeded INFO: able to execute "repmgr" on remote host "172.72.6.62" INFO: all sibling nodes are reachable via SSH INFO: 3 walsenders required, 10 available INFO: demotion candidate is able to make replication connection to promotion candidate INFO: 0 pending archive files INFO: replication lag on this standby is 0 seconds NOTICE: local node "lhrrepmgr64361" (ID: 1) would be promoted to primary; current primary "lhrrepmgr64362" (ID: 2) would be demoted to standby INFO: following shutdown command would be run on node "lhrrepmgr64362": "/pg13/pg13/bin/pg_ctl -D '/pg13/pgdata' -W -m fast stop" INFO: parameter "shutdown_check_timeout" is set to 60 seconds INFO: prerequisites for executing STANDBY SWITCHOVER are met [pg13@lhrrepmgr64361 ~]$ [pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind NOTICE: executing switchover on node "lhrrepmgr64361" (ID: 1) NOTICE: local node "lhrrepmgr64361" (ID: 1) will be promoted to primary; current primary "lhrrepmgr64362" (ID: 2) will be demoted to standby NOTICE: stopping current primary node "lhrrepmgr64362" (ID: 2) NOTICE: issuing CHECKPOINT on node "lhrrepmgr64362" (ID: 2) DETAIL: executing server command "/pg13/pg13/bin/pg_ctl -D '/pg13/pgdata' -W -m fast stop" INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout") INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout") NOTICE: current primary has been cleanly shut down at location 0/A000028 NOTICE: promoting standby to primary DETAIL: promoting server "lhrrepmgr64361" (ID: 1) using pg_promote() NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete NOTICE: STANDBY PROMOTE successful DETAIL: server "lhrrepmgr64361" (ID: 1) was successfully promoted to primary NOTICE: issuing CHECKPOINT on node "lhrrepmgr64361" (ID: 1) ERROR: unable to execute CHECKPOINT INFO: local node 2 can attach to rejoin target node 1 DETAIL: local node's recovery point: 0/A000028; rejoin target node's fork point: 0/A0000A0 NOTICE: setting node 2's upstream to node 1 WARNING: unable to ping "host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2" DETAIL: PQping() returned "PQPING_NO_RESPONSE" NOTICE: starting server using "/pg13/pg13/bin/pg_ctl -w -D '/pg13/pgdata' start" NOTICE: NODE REJOIN successful DETAIL: node 2 is now attached to node 1 NOTICE: node "lhrrepmgr64361" (ID: 1) promoted to primary, node "lhrrepmgr64362" (ID: 2) demoted to standby NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes NOTICE: switchover was successful DETAIL: node "lhrrepmgr64361" is now primary and node "lhrrepmgr64362" is attached as standby NOTICE: STANDBY SWITCHOVER has completed successfully [pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+----------------+---------+-----------+----------------+----------+----------+----------+--------------------------------------------------------------------------- 1 | lhrrepmgr64361 | primary | * running | | default | 100 | 5 | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2 2 | lhrrepmgr64362 | standby | running | lhrrepmgr64361 | default | 100 | 4 | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2 3 | lhrrepmgr64363 | standby | running | lhrrepmgr64361 | default | 100 | 4 | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2 4 | lhrrepmgr64364 | witness | * running | lhrrepmgr64361 | default | 0 | n/a | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2 |