Greenplum中检测和恢复故障的master实例
Tags: gpactivatestandbygpinitstandbyGreenPlummaster故障处理检测和恢复高可用
greenplum整个集群是由多台服务器组合而成,任何一台服务都有可能发生软件或硬件故障,我们一起来模拟一下任何一个节点或服务器故障后,greenplumn的容错及恢复方法.
本文主要说明master或standby master故障后的处理办法。
master状态检测
使用gpstate -f
或查询select * from gp_segment_configuration order by 2,1;
即可。
Greenplum中恢复故障的Master命令gpactivatestandby
https://www.xmmup.com/greenplumzhonghuifuguzhangdemasterminglinggpactivatestandby.html
master恢复示例
环境说明:https://www.xmmup.com/mppjiagouzhigreenplumdeanzhuangpeizhigaojiban.html#huan_jing_shen_qing
正常集群状态
在master查看数据库当前的状态:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | [gpadmin@mdw1 ~]$ gpstate 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Starting gpstate with args: 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source' 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23' 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Obtaining Segment details from master... 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Gathering data from segments... 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Greenplum instance status summary 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Master instance = Active 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Master standby = mdw2 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Standby master state = Standby host passive 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total segment instance count from metadata = 32 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Primary Segment Status 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total primary segments = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total primary segment valid (at master) = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total primary segment failures (at master) = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Mirror Segment Status 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segments = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segment valid (at master) = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segment failures (at master) = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number mirror segments acting as primary segments = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number mirror segments acting as mirror segments = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- |
故障情况1:master 服务器故障
当master节点故障后,我们需要激活standby节点作为新的master节点(如果服务器配置有VIP,那么把vip也切换到standby服务器)
在激活standby节点的可以直接指定新的standby节点,也可以等原master服务器恢复后,指定原master节点为standby节点
关闭master节点
1 2 | [root@lhrxxt ~]# docker stop mdw1 mdw1 |
激活standby节点
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | [root@mdw2 /]# su - gpadmin Last login: Tue Jan 31 16:05:54 CST 2023 on pts/1 [gpadmin@mdw2 ~]$ [gpadmin@mdw2 ~]$ gpactivatestandby -d /opt/greenplum/data/master/gpseg-1 20230201:13:47:23:016042 gpactivatestandby:mdw2:gpadmin-[CRITICAL]:-PGPORT environment variable not set. [gpadmin@mdw2 ~]$ export PGPORT=5432 [gpadmin@mdw2 ~]$ gpactivatestandby -d /opt/greenplum/data/master/gpseg-1 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: echo 'START_CMD_OUTPUT';ps -ef | grep postgres | grep -v grep | awk '{print $2}' | grep \`cat /opt/greenplum/data/master/gpseg-1/postmaster.pid | head -1\` || echo -1 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby data directory = /opt/greenplum/data/master/gpseg-1 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby port = 5432 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby running = yes 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Force standby activation = no 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: cat /tmp/.s.PGSQL.5432.lock Do you want to continue with standby master activation? Yy|Nn (default=N): > y 20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: ps -ef | grep postgres | grep -v grep | awk '{print $2}' | grep `cat /opt/greenplum/data/master/gpseg-1/postmaster.pid | head -1` || echo -1 20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-found standby postmaster process 20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Promoting standby... 20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: pg_ctl promote -D /opt/greenplum/data/master/gpseg-1 20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Waiting for connection... 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby master is promoted 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Reading current configuration... 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Connecting to dbname='postgres' 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-The activation of the standby master has completed successfully. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-mdw2 is now the new primary master. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-You will need to update your user access mechanism to reflect 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-the change of master hostname. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Do not re-start the failed master while the fail-over master is 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-operational, this could result in database corruption! 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-MASTER_DATA_DIRECTORY is now /opt/greenplum/data/master/gpseg-1 if 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-this has changed as a result of the standby master activation, remember 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-to change this in any startup scripts etc, that may be configured 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-to set this value. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-MASTER_PORT is now 5432, if this has changed, you 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-may need to make additional configuration changes to allow access 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-to the Greenplum instance. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Refer to the Administrator Guide for instructions on how to re-activate 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-the master to its previous state once it becomes available. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Query planner statistics must be updated on all databases 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-following standby master activation. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-When convenient, run ANALYZE against all user databases. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ [gpadmin@mdw2 ~]$ |
standby master启用VIP:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | [root@mdw2 ~]$ ip addr add 172.72.6.56/16 dev eth0 label eth0:1 [root@mdw2 /]# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.72.6.51 netmask 255.255.0.0 broadcast 172.72.255.255 ether 02:42:ac:48:06:33 txqueuelen 0 (Ethernet) RX packets 94731 bytes 7030183 (6.7 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 37837 bytes 166369481 (158.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.72.6.56 netmask 255.255.0.0 broadcast 0.0.0.0 ether 02:42:ac:48:06:33 txqueuelen 0 (Ethernet) lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 1000 (Local Loopback) RX packets 1682 bytes 463712 (452.8 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1682 bytes 463712 (452.8 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@mdw2 /]# [root@lhrxxt ~]# psql -U gpadmin -h 172.72.6.56 -d postgres Password for user gpadmin: psql (13.9, server 9.4.26) Type "help" for help. postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+---------+----------+------------+------------+--------------------- lhrgpdb | gpadmin | UTF8 | en_US.utf8 | en_US.utf8 | postgres | gpadmin | UTF8 | en_US.utf8 | en_US.utf8 | template0 | gpadmin | UTF8 | en_US.utf8 | en_US.utf8 | =c/gpadmin + | | | | | gpadmin=CTc/gpadmin template1 | gpadmin | UTF8 | en_US.utf8 | en_US.utf8 | =c/gpadmin + | | | | | gpadmin=CTc/gpadmin testdb | gpadmin | UTF8 | en_US.utf8 | en_US.utf8 | (5 rows) postgres=# |
此时,集群即可正常使用。
修复mdw1节点,然后指定为新的standby节点
我们指定原master节点为新的standby节点服务器
需要先删除原master的数据文件,然后重新执行初始化standby节点即可
错误操作:在mdw1节点启动整个GreenPlum集群会报错,修复集群也会报错:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | [gpadmin@mdw1 ~]$ gpstate 20230201:13:56:07:000419 gpstate:mdw1:gpadmin-[INFO]:-Starting gpstate with args: 20230201:13:56:07:000419 gpstate:mdw1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source' 20230201:13:56:07:000419 gpstate:mdw1:gpadmin-[CRITICAL]:-gpstate failed. (Reason='could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5432? could not connect to server: Cannot assign requested address Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5432? ') exiting... [gpadmin@mdw1 ~]$ gpstart 20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Starting gpstart with args: 20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Gathering information and validating the environment... 20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source' 20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Greenplum Catalog Version: '301908232' 20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Starting Master instance in admin mode 20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Obtaining Segment details from master... 20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Setting new master era 20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Master Started... 20230201:13:56:35:000462 gpstart:mdw1:gpadmin-[INFO]:-Master Stopped... 20230201:13:56:35:000462 gpstart:mdw1:gpadmin-[ERROR]:-gpstart error: Standby activated, this node no more can act as master. [gpadmin@mdw1 ~]$ gpinitstandby -a -s mdw1 20230201:13:57:06:000532 gpinitstandby:mdw1:gpadmin-[ERROR]:-Failed to retrieve configuration information from the master. 20230201:13:57:06:000532 gpinitstandby:mdw1:gpadmin-[ERROR]:-Failed to create standby 20230201:13:57:06:000532 gpinitstandby:mdw1:gpadmin-[ERROR]:-Error initializing standby master: could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5432? could not connect to server: Cannot assign requested address Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5432? |
我们应该在md2上操作:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | [gpadmin@mdw2 ~]$ gpinitstandby -a -s mdw1 20230201:13:57:16:016792 gpinitstandby:mdw2:gpadmin-[INFO]:-Validating environment and parameters for standby initialization... 20230201:13:57:16:016792 gpinitstandby:mdw2:gpadmin-[INFO]:-Checking for data directory /opt/greenplum/data/master/gpseg-1 on mdw1 20230201:13:57:16:016792 gpinitstandby:mdw2:gpadmin-[ERROR]:-Data directory already exists on host mdw1 20230201:13:57:16:016792 gpinitstandby:mdw2:gpadmin-[ERROR]:-Failed to create standby 20230201:13:57:16:016792 gpinitstandby:mdw2:gpadmin-[ERROR]:-Error initializing standby master: master data directory exists -- 注意:会报错,需要去mdw1上删除原来的数据目录: [gpadmin@mdw1 master]$ cd /opt/greenplum/data/master/ [gpadmin@mdw1 master]$ rm -rf gpseg-1/ [gpadmin@mdw1 master]$ ll -- 继续在mdw2上进行修复: [gpadmin@mdw2 ~]$ gpinitstandby -a -s mdw1 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Validating environment and parameters for standby initialization... 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Checking for data directory /opt/greenplum/data/master/gpseg-1 on mdw1 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master initialization parameters 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master hostname = mdw2 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master data directory = /opt/greenplum/data/master/gpseg-1 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master port = 5432 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master hostname = mdw1 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master port = 5432 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master data directory = /opt/greenplum/data/master/gpseg-1 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum update system catalog = On 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-The packages on mdw1 are consistent. 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Adding standby master to catalog... 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Database catalog updated successfully. 20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Updating pg_hba.conf file... 20230201:13:58:31:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-pg_hba.conf files updated successfully. 20230201:13:58:38:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Starting standby master 20230201:13:58:38:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Checking if standby master is running on host: mdw1 in directory: /opt/greenplum/data/master/gpseg-1 20230201:13:58:39:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files... 20230201:13:58:39:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully. 20230201:13:58:39:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Successfully created standby master on mdw1 |
修复完成后的状态:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | [gpadmin@mdw2 ~]$ gpconfig -s max_connections Values on all segments are consistent GUC : max_connections Master value: 500 Segment value: 1500 [gpadmin@mdw2 ~]$ gpstate 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source' 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23' 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master... 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-Gathering data from segments... 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-Greenplum instance status summary 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:----------------------------------------------------- 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Master instance = Active 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Master standby = mdw1 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Standby master state = Standby host passive 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total segment instance count from metadata = 32 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:----------------------------------------------------- 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Primary Segment Status 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:----------------------------------------------------- 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total primary segments = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total primary segment valid (at master) = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total primary segment failures (at master) = 0 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of postmaster.pid files found = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of /tmp lock files found = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number postmaster processes missing = 0 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number postmaster processes found = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:----------------------------------------------------- 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Mirror Segment Status 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:----------------------------------------------------- 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total mirror segments = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total mirror segment valid (at master) = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total mirror segment failures (at master) = 0 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of postmaster.pid files found = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number of /tmp lock files found = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number postmaster processes missing = 0 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number postmaster processes found = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number mirror segments acting as primary segments = 0 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:- Total number mirror segments acting as mirror segments = 16 20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:----------------------------------------------------- [gpadmin@mdw2 ~]$ gpstate -f 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: -f 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source' 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23' 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master... 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-Standby master details 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:----------------------- 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:- Standby address = mdw1 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:- Standby data directory = /opt/greenplum/data/master/gpseg-1 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:- Standby port = 5432 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:- Standby PID = 1061 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:- Standby status = Standby host passive 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--pg_stat_replication 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--WAL Sender State: streaming 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--Sync state: sync 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--Sent Location: 0/2C000000 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--Flush Location: 0/2C000000 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--Replay Location: 0/2C000000 20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- |
故障情况2:standby 服务器故障
情况1:能正常启动
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | [gpadmin@mdw2 ~]$ gpstate -f 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: -f 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source' 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23' 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master... 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-Standby master details 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:----------------------- 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:- Standby address = mdw1 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:- Standby data directory = /opt/greenplum/data/master/gpseg-1 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:- Standby port = 5432 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[WARNING]:-Standby PID = 1061 No socket connection or lock file (/tmp/.s.PGSQL.5432.lock) found for port 5432 <<<<<<<< 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[WARNING]:-Standby status = Status could not be determined <<<<<<<< 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:--pg_stat_replication 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-No entries found. 20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- [gpadmin@mdw1 ~]$ nohup /usr/local/greenplum-db-6.23.0/bin/postgres -D /opt/greenplum/data/master/gpseg-1 -p 5432 -E & [1] 2385 [gpadmin@mdw2 ~]$ gpstate -f 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: -f 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source' 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23' 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master... 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-Standby master details 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:----------------------- 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:- Standby address = mdw1 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:- Standby data directory = /opt/greenplum/data/master/gpseg-1 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:- Standby port = 5432 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:- Standby PID = 2385 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:- Standby status = Standby host passive 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--pg_stat_replication 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--WAL Sender State: streaming 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--Sync state: sync 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--Sent Location: 0/2C0000A8 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--Flush Location: 0/2C0000A8 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--Replay Location: 0/2C0000A8 20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- [gpadmin@mdw2 ~]$ |
情况2:数据库文件损坏不能正常启动
若数据库文件损坏不能正常启动,则需要将standby节点数据删除,然后重新初始化一下standby服务器即可
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | [gpadmin@mdw1 master]$ ps -ef|grep green gpadmin 2385 2338 0 15:09 pts/1 00:00:00 /usr/local/greenplum-db-6.23.0/bin/postgres -D /opt/greenplum/data/master/gpseg-1 -p 5432 -E gpadmin 2740 2338 0 15:13 pts/1 00:00:00 grep --color=auto green [gpadmin@mdw1 master]$ kill -9 2385 [gpadmin@mdw1 master]$ [gpadmin@mdw1 ~]$ rm -rf /opt/greenplum/data/master/gpseg-1 [gpadmin@mdw1 ~]$ cd /opt/greenplum/data/master/ [gpadmin@mdw1 master]$ ll total 0 [gpadmin@mdw2 ~]$ gpstate -f 20230201:15:13:58:021233 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: -f 20230201:15:13:58:021233 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source' 20230201:15:13:58:021233 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23' 20230201:15:13:58:021233 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master... 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:-Standby master details 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:----------------------- 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:- Standby address = mdw1 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:- Standby data directory = /opt/greenplum/data/master/gpseg-1 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:- Standby port = 5432 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[WARNING]:-Standby PID = 0 <<<<<<<< 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[WARNING]:-Standby status = Standby process not running <<<<<<<< 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:--pg_stat_replication 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:-No entries found. 20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- [gpadmin@mdw2 ~]$ |
修复:需要先删除再添加:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | [gpadmin@mdw2 ~]$ gpinitstandby -s mdw1 -S /opt/greenplum/data/master/gpseg-1 -P 5432 20230201:15:14:41:021292 gpinitstandby:mdw2:gpadmin-[INFO]:-Validating environment and parameters for standby initialization... 20230201:15:14:41:021292 gpinitstandby:mdw2:gpadmin-[ERROR]:-Standby master already configured 20230201:15:14:41:021292 gpinitstandby:mdw2:gpadmin-[INFO]:-If you want to start the stopped standby master, use the -n option 20230201:15:14:41:021292 gpinitstandby:mdw2:gpadmin-[ERROR]:-Failed to create standby 20230201:15:14:41:021292 gpinitstandby:mdw2:gpadmin-[ERROR]:-Error initializing standby master: standby master already configured [gpadmin@mdw2 ~]$ gpinitstandby -r 20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Warm master standby removal parameters 20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master hostname = mdw2 20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master data directory = /opt/greenplum/data/master/gpseg-1 20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master port = 5432 20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master hostname = mdw1 20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master port = 5432 20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master data directory = /opt/greenplum/data/master/gpseg-1 Do you want to continue with deleting the standby master? Yy|Nn (default=N): > y 20230201:15:15:32:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Removing standby master from catalog... 20230201:15:15:32:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Database catalog updated successfully. 20230201:15:15:32:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Removing data directory on standby master... 20230201:15:15:33:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Successfully removed standby master [gpadmin@mdw2 ~]$ gpinitstandby -s mdw1 -S /opt/greenplum/data/master/gpseg-1 -P 5432 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Validating environment and parameters for standby initialization... 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Checking for data directory /opt/greenplum/data/master/gpseg-1 on mdw1 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master initialization parameters 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master hostname = mdw2 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master data directory = /opt/greenplum/data/master/gpseg-1 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master port = 5432 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master hostname = mdw1 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master port = 5432 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master data directory = /opt/greenplum/data/master/gpseg-1 20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum update system catalog = On Do you want to continue with standby master initialization? Yy|Nn (default=N): > y 20230201:15:15:42:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby 20230201:15:15:42:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-The packages on mdw1 are consistent. 20230201:15:15:42:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Adding standby master to catalog... 20230201:15:15:42:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Database catalog updated successfully. 20230201:15:15:42:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Updating pg_hba.conf file... 20230201:15:15:43:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-pg_hba.conf files updated successfully. 20230201:15:15:50:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Starting standby master 20230201:15:15:50:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Checking if standby master is running on host: mdw1 in directory: /opt/greenplum/data/master/gpseg-1 20230201:15:15:51:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files... 20230201:15:15:51:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully. 20230201:15:15:51:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Successfully created standby master on mdw1 [gpadmin@mdw2 ~]$ gpstate -f 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: -f 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source' 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23' 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master... 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-Standby master details 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:----------------------- 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:- Standby address = mdw1 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:- Standby data directory = /opt/greenplum/data/master/gpseg-1 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:- Standby port = 5432 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:- Standby PID = 3328 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:- Standby status = Standby host passive 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--pg_stat_replication 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--WAL Sender State: streaming 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--Sync state: sync 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--Sent Location: 0/34000000 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--Flush Location: 0/34000000 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--Replay Location: 0/34000000 20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-------------------------------------------------------------- [gpadmin@mdw2 ~]$ |
总结
1、若master故障,则直接在standby master上执行如下命令将standby master激活为主master:
1 | gpactivatestandby -d /opt/greenplum/data/master/gpseg-1 |
若有VIP也需要切换VIP。
然后原来的master节点可以删除数据后,然后作为新的standby加入:
1 | gpinitstandby -a -s mdw1 |
2、若standby master故障,则分情况:若是宕机,其实可以正常启动,使用如下命令正常启动:
1 | nohup /usr/local/greenplum-db-6.23.0/bin/postgres -D /opt/greenplum/data/master/gpseg-1 -p 5432 -E & |
若数据库文件损坏不能正常启动,则需要将standby节点数据删除,然后重新初始化一下standby服务器即可,需要先删除再添加:
1 2 | gpinitstandby -r gpinitstandby -s mdw1 -S /opt/greenplum/data/master/gpseg-1 -P 5432 |
3、master和standby master就是PG中的主从复制,要想实现自动故障转移,则需要借助keepalived或repmgr或patroni或pgpool实现高可用自动切换。
参考
https://blog.csdn.net/weixin_33724570/article/details/89720415
https://www.xmmup.com/greenplumshujukugaokeyongxinggaishu.html#Master_jing_xiang