原 Greenplum中检测和恢复故障的master实例(gpinitstandby命令)
Tags: 原创故障处理GreenPlum高可用检测和恢复mastergpinitstandbygpactivatestandby
简介
greenplum整个集群是由多台服务器组合而成,任何一台服务都有可能发生软件或硬件故障,我们一起来模拟一下任何一个节点或服务器故障后,greenplumn的容错及恢复方法.
本文主要说明master或standby master故障后的处理办法。
master状态检测
使用gpstate -f
或查询select * from gp_segment_configuration order by 2,1;
即可。
Greenplum中切换故障的Master命令gpactivatestandby
https://www.xmmup.com/greenplumzhonghuifuguzhangdemasterminglinggpactivatestandby.html
master恢复示例
环境说明:https://www.xmmup.com/mppjiagouzhigreenplumdeanzhuangpeizhigaojiban.html#huan_jing_shen_qing
正常集群状态
在master查看数据库当前的状态:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | [gpadmin@mdw1 ~]$ gpstate 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Starting gpstate with args: 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source' 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23' 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Obtaining Segment details from master... 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Gathering data from segments... 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Greenplum instance status summary 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Master instance = Active 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Master standby = mdw2 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Standby master state = Standby host passive 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total segment instance count from metadata = 32 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Primary Segment Status 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total primary segments = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total primary segment valid (at master) = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total primary segment failures (at master) = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Mirror Segment Status 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segments = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segment valid (at master) = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segment failures (at master) = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes missing = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes found = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number mirror segments acting as primary segments = 0 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:- Total number mirror segments acting as mirror segments = 16 20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- |
故障情况1:master 服务器故障
当master节点故障后,我们需要激活standby节点作为新的master节点(如果服务器配置有VIP,那么把vip也切换到standby服务器)
在激活standby节点的可以直接指定新的standby节点,也可以等原master服务器恢复后,指定原master节点为standby节点
关闭master节点
1 2 | [root@lhrxxt ~]# docker stop mdw1 mdw1 |
激活standby节点
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | [root@mdw2 /]# su - gpadmin Last login: Tue Jan 31 16:05:54 CST 2023 on pts/1 [gpadmin@mdw2 ~]$ [gpadmin@mdw2 ~]$ gpactivatestandby -d /opt/greenplum/data/master/gpseg-1 20230201:13:47:23:016042 gpactivatestandby:mdw2:gpadmin-[CRITICAL]:-PGPORT environment variable not set. [gpadmin@mdw2 ~]$ export PGPORT=5432 [gpadmin@mdw2 ~]$ gpactivatestandby -d /opt/greenplum/data/master/gpseg-1 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: echo 'START_CMD_OUTPUT';ps -ef | grep postgres | grep -v grep | awk '{print $2}' | grep \`cat /opt/greenplum/data/master/gpseg-1/postmaster.pid | head -1\` || echo -1 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby data directory = /opt/greenplum/data/master/gpseg-1 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby port = 5432 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby running = yes 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Force standby activation = no 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: cat /tmp/.s.PGSQL.5432.lock Do you want to continue with standby master activation? Yy|Nn (default=N): > y 20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: ps -ef | grep postgres | grep -v grep | awk '{print $2}' | grep `cat /opt/greenplum/data/master/gpseg-1/postmaster.pid | head -1` || echo -1 20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-found standby postmaster process 20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Promoting standby... 20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: pg_ctl promote -D /opt/greenplum/data/master/gpseg-1 20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Waiting for connection... 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby master is promoted 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Reading current configuration... 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Connecting to dbname='postgres' 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-The activation of the standby master has completed successfully. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-mdw2 is now the new primary master. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-You will need to update your user access mechanism to reflect 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-the change of master hostname. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Do not re-start the failed master while the fail-over master is 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-operational, this could result in database corruption! 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-MASTER_DATA_DIRECTORY is now /opt/greenplum/data/master/gpseg-1 if 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-this has changed as a result of the standby master activation, remember 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-to change this in any startup scripts etc, that may be configured 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-to set this value. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-MASTER_PORT is now 5432, if this has changed, you 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-may need to make additional configuration changes to allow access 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-to the Greenplum instance. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Refer to the Administrator Guide for instructions on how to re-activate 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-the master to its previous state once it becomes available. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Query planner statistics must be updated on all databases 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-following standby master activation. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-When convenient, run ANALYZE against all user databases. 20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------ [gpadmin@mdw2 ~]$ |