GreenPlum在何时会发生自动故障切换及如何查询(OOM)
Tags: GreenPlumOOMsegment故障处理自动故障转移
简介
若发生了OOM,则在gpcc的告警通知里会有“[告警]Out of memory errors”,例如:
情况1(大部分情况):发生了OOM
发生自动切换的一个示例是发生了OOM,在master的日志文件中会有如下的内容:
“FTS: cannot establish libpq connection (content=0, dbid=11): could not fork new process for connection: Cannot allocate memory”或“FATAL: Out of memory. Failed on request of size 144 bytes. (context 'GPORCAmemory pool') ”或“ATAL: the database system is in recovery mode”,
若没有swap内存配置,会发生OOM,特别严重时会导致segment自动故障切换。
1 2 3 4 5 6 7 8 | FTS: cannot establish libpq connection (content=0, dbid=11): could not fork new process for connection: Cannot allocate memory The previous session was reset because its gang was disconnected (session id = 6072). The new session id = 109485 FATAL: Out of memory. Failed on request of size 144 bytes. (context 'GPORCAmemory pool') FATAL: the database system is in recovery mode gang was lost due to cluster reconfiguration(cdbgang_async.c:97) rejecting TCP connection to master using internalconnection protocol Any temporary tables for this session have been dropped because the gang was disconnected (session id = 85341) failed to acquire resources on one or more segments |
情况2:CPU压力或主机压力较大导致系统资源不足
当某个节点的CPU压力较大或主机压力较大导致系统资源不足时也会发生故障启动切换。
例如,最大进程数超限,此时,日志报错:
1 2 3 4 5 6 7 8 9 | could not fork new process for connection: Resource temporarily unavailable could not fork new process for connection: Resource temporarily unavailable (seg0 129.10.25.26:6000) FATAL: InitMotionLayerIPC: failed to create thread (ic_udpifc.c:1488) DETAIL: pthread_create() failed with err 11 (seg11 129.10.25.26:7003) |
该报错,多半是因为内核参数没有做修改,修复如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ll /lib64/security/pam_limits.so echo "session required /lib64/security/pam_limits.so" >> /etc/pam.d/login cat >> /etc/security/limits.conf <<"EOF" * soft nofile 655350 * hard nofile 655350 * soft nproc 655350 * hard nproc 655350 gpadmin soft priority -20 EOF sed -i 's/4096/655350/' /etc/security/limits.d/20-nproc.conf cat /etc/security/limits.d/20-nproc.conf cat >> /etc/sysctl.conf <<"EOF" fs.file-max=9000000 fs.inotify.max_user_instances = 1000000 fs.inotify.max_user_watches = 1000000 kernel.pid_max=4194304 EOF sysctl -p |
重启主机生效。
情况3:kill掉实例
当手工kill掉某个PG实例后,也会自动发生切换。
排查SQL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | -- 找到发生切换的时间点 select * from gp_configuration_history ORDER BY time desc limit 100; -- 查询时间视图 select 'gp_log_master_ext' tb,min(logtime),max(logtime) from gpmetrics.gp_log_master_ext union all select 'gpcc_pg_log_history' tb,min(logtime),max(logtime) from gpmetrics.gpcc_pg_log_history; -- 查询切换时的详细日志(有时候查不出来,需要去直接查询文件) SELECT d.logtime, d.loguser, d.logdatabase, d.loghost, d.logsessiontime, d.logseverity, d.logmessage, d.logdebug, d.logdetail, logcontext from gpmetrics.gpcc_pg_log_history d where d.logseverity not in ( 'LOG' ) and d.logstate not in ( '58P01' ) -- and d.logtime >= now()-interval '7 day' and d.logtime >= '2023-07-12 02:10' and d.logtime <= '2023-07-12 12:18' and ( d.logmessage like 'Out of memory%' or d.logmessage like 'failed to acquire resources on one or more segments%' or d.logmessage like 'FTS detected connection lost during dispatch to%' or d.logmessage like 'ERROR: FTS double fault detected%' or d.logmessage like 'Canceling query because of high VMEM usage%' or d.logdetail like '%Cannot allocate memory%' or d.logdetail like '%FATAL: out of memory%' or d.logdetail like '%System memory limit reached, failed to allocate %' or d.logdetail like 'FTS detected one or more segments are down' or d.logdetail like '%Vmem limit reached, failed to allocate%' or d.logdetail like 'FATAL: InitMotionLayerIPC: failed to create thread%' or d.logdetail like 'could not fork new process for connection%' or d.logdetail like '%Resource temporarily unavailable%' ) order by d.logtime limit 500; SELECT d.logtime, d.loguser, d.logdatabase, d.loghost, d.logsessiontime, d.logseverity, d.logmessage, d.logdebug, d.logdetail, logcontext FROM gpmetrics.gp_log_master_ext d where d.logseverity not in ('LOG') and d.logstate not in ('58P01') -- and d.logtime >= now()-interval '7 day' and d.logtime >= '2023-07-10 10:00' and d.logtime <= '2023-07-10 11:01' order by d.logtime ; -- 查看日志 ps -ef|grep post| grep bin cd /opt/greenplum/data/master/gpseg-1/pg_log -- 每隔15秒收集1次所有主机的系统信息 SELECT ctime::date || ' ' || CASE WHEN extract(hour from ctime) < 10 THEN '0' ELSE '' END ||extract(hour from ctime) ||':'|| CASE WHEN extract(MINUTE from ctime) < 10 THEN '0' ELSE '' END || extract(MINUTE from ctime) AS time1 , hostname , ROUND(AVG(cpu_sys::numeric),2) AS cpu_sys , ROUND(avg(cpu_user::numeric),2) AS cpu_user , ROUND(avg((cpu_sys+cpu_user)::numeric),2) AS cpu_use , ROUND(avg((cpu_iowait)::numeric),2) AS cpu_iowait , ROUND(avg(gsh.cpu_idle::numeric),2) AS cpu_idle , ROUND(avg(mem_total::numeric/1024/1024/1024)) AS mem_total_GB , ROUND(avg(mem_used::numeric/1024/1024/1024),2) AS mem_used_GB -- , ROUND(avg(mem_actual_used::numeric/1024/1024/1024),2) AS mem_actual_GB , ROUND(avg((mem_buffers+mem_cached)::numeric/1024/1024/1024),2) AS mem_buffer_cache_GB -- , ROUND(avg(((gsh.mem_total-mem_used)::numeric)/1024/1024/1024),2) AS mem_available_GB , ROUND(avg(gsh.swap_total::numeric/1024/1024/1024)) as swap_total , ROUND(avg(gsh.swap_used::numeric/1024/1024/1024),2) as swap_used , ROUND(avg(gsh.swap_page_in::numeric),2) as swap_page_in , ROUND(avg(gsh.swap_page_out::numeric),2) as swap_page_out , ROUND(avg(load0::numeric),2) AS load0 , ROUND(avg(load1::numeric),2) AS load1 , ROUND(avg(load2::numeric),2) AS load2 , ROUND((avg(disk_rb_rate::numeric) / 1024 / 1024),2) AS disk_R_MBs , ROUND((avg(disk_wb_rate::numeric) / 1024 / 1024),2) AS disk_W_MBs , ROUND((avg(net_rb_rate::numeric) / 1024 / 1024),2) AS net_I_MBs , ROUND((avg(net_wb_rate::numeric) / 1024 / 1024),2) AS net_O_MBs FROM gpmetrics.gpcc_system_history gsh WHERE hostname LIKE 'mdw%' AND ctime >= '2023-09-14 16:30:00' AND ctime < '2023-09-14 19:00:00' GROUP BY 1,2 ORDER BY 1 ; -- 系统内存和swap信息 SELECT DISTINCT gsh.hostname, ROUND(gsh.mem_total::numeric/1024/1024/1024) AS mem_total_G, ROUND(gsh.swap_total::numeric/1024/1024/1024) AS swap_total_G FROM gpmetrics.gpcc_system_history gsh where ctime >= NOW() - INTERVAL '1 minute' and gsh.mem_total > 0 order by 1 limit 10; |
自动故障切换模拟
原来环境
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | postgres=# select * from gp_segment_configuration d ORDER BY 8,4 desc,2 ; dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir ------+---------+------+----------------+------+--------+------+----------+---------+-------------------------------------------- 1 | -1 | p | p | n | u | 5432 | gpdb6 | gpdb6 | /opt/greenplum/data/master/gpseg-1 2 | 0 | p | p | s | u | 6000 | gpdb6 | gpdb6 | /opt/greenplum/data/primary/gpseg0 3 | 1 | p | p | s | u | 6001 | gpdb6 | gpdb6 | /opt/greenplum/data/primary/gpseg1 6 | -1 | m | m | s | u | 5433 | gpdb6 | gpdb6 | /opt/greenplum/data/master_standby/gpseg-1 4 | 0 | m | m | s | u | 7000 | gpdb6 | gpdb6 | /opt/greenplum/data/mirror/gpseg0 5 | 1 | m | m | s | u | 7001 | gpdb6 | gpdb6 | /opt/greenplum/data/mirror/gpseg1 (6 rows) [gpadmin@gpdb6 gpseg1]$ gpstate -e 20230721:12:36:58:004252 gpstate:gpdb6:gpadmin-[INFO]:-Starting gpstate with args: -e 20230721:12:36:58:004252 gpstate:gpdb6:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.1 build commit:2731a45ecb364317207c560730cf9e2cbf17d7e4 Open Source' 20230721:12:36:58:004252 gpstate:gpdb6:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.1 build commit:2731a45ecb364317207c560730cf9e2cbf17d7e4 Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Feb 7 2023 22:54:40' 20230721:12:36:58:004252 gpstate:gpdb6:gpadmin-[INFO]:-Obtaining Segment details from master... 20230721:12:36:58:004252 gpstate:gpdb6:gpadmin-[INFO]:-Gathering data from segments... . 20230721:12:36:59:004252 gpstate:gpdb6:gpadmin-[INFO]:----------------------------------------------------- 20230721:12:36:59:004252 gpstate:gpdb6:gpadmin-[INFO]:-Segment Mirroring Status Report 20230721:12:36:59:004252 gpstate:gpdb6:gpadmin-[INFO]:----------------------------------------------------- 20230721:12:36:59:004252 gpstate:gpdb6:gpadmin-[INFO]:-All segments are running normally [gpadmin@gpdb6 gpseg1]$ |
模拟故障
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | [gpadmin@gpdb6 pg_log]$ ps -ef|grep bin | grep post gpadmin 9643 1 0 Jul20 ? 00:00:08 /usr/local/greenplum-db-6.23.1/bin/postgres -D /opt/greenplum/data/primary/gpseg0 -p 6000 gpadmin 9645 1 0 Jul20 ? 00:00:09 /usr/local/greenplum-db-6.23.1/bin/postgres -D /opt/greenplum/data/primary/gpseg1 -p 6001 gpadmin 9647 1 0 Jul20 ? 00:00:00 /usr/local/greenplum-db-6.23.1/bin/postgres -D /opt/greenplum/data/mirror/gpseg1 -p 7001 gpadmin 9648 1 0 Jul20 ? 00:00:00 /usr/local/greenplum-db-6.23.1/bin/postgres -D /opt/greenplum/data/mirror/gpseg0 -p 7000 gpadmin 9686 0 0 Jul20 ? 00:00:08 /usr/local/greenplum-db-6.23.1/bin/postgres -D /opt/greenplum/data/master/gpseg-1 -p 5432 -E gpadmin 9806 1 0 Jul20 ? 00:00:00 /usr/local/greenplum-db-6.23.1/bin/postgres -D /opt/greenplum/data/master_standby/gpseg-1 -p 5433 -E [gpadmin@gpdb6 pg_log]$ kill -9 9643 [gpadmin@gpdb6 pg_log]$ ps -ef|grep bin | grep post gpadmin 9645 1 0 Jul20 ? 00:00:09 /usr/local/greenplum-db-6.23.1/bin/postgres -D /opt/greenplum/data/primary/gpseg1 -p 6001 gpadmin 9647 1 0 Jul20 ? 00:00:00 /usr/local/greenplum-db-6.23.1/bin/postgres -D /opt/greenplum/data/mirror/gpseg1 -p 7001 gpadmin 9648 1 0 Jul20 ? 00:00:00 /usr/local/greenplum-db-6.23.1/bin/postgres -D /opt/greenplum/data/mirror/gpseg0 -p 7000 gpadmin 9686 0 0 Jul20 ? 00:00:08 /usr/local/greenplum-db-6.23.1/bin/postgres -D /opt/greenplum/data/master/gpseg-1 -p 5432 -E gpadmin 9806 1 0 Jul20 ? 00:00:00 /usr/local/greenplum-db-6.23.1/bin/postgres -D /opt/greenplum/data/master_standby/gpseg-1 -p 5433 -E postgres=# select * from gp_segment_configuration d ORDER BY 8,4 desc,2 ; dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir ------+---------+------+----------------+------+--------+------+----------+---------+-------------------------------------------- 1 | -1 | p | p | n | u | 5432 | gpdb6 | gpdb6 | /opt/greenplum/data/master/gpseg-1 2 | 0 | m | p | n | d | 6000 | gpdb6 | gpdb6 | /opt/greenplum/data/primary/gpseg0 3 | 1 | p | p | s | u | 6001 | gpdb6 | gpdb6 | /opt/greenplum/data/primary/gpseg1 6 | -1 | m | m | s | u | 5433 | gpdb6 | gpdb6 | /opt/greenplum/data/master_standby/gpseg-1 4 | 0 | p | m | n | u | 7000 | gpdb6 | gpdb6 | /opt/greenplum/data/mirror/gpseg0 5 | 1 | m | m | s | u | 7001 | gpdb6 | gpdb6 | /opt/greenplum/data/mirror/gpseg1 (6 rows) |