视图DBA_GOLDENGATE_SUPPORT_MODE引起的集成模式OGG不同步问题排查过程
Tags: OracleOGG故障处理bug11.2.0.4故障排查不同步慢
现象
使用OGG 21.3可以远程同步Oracle 11.2.0.4的数据库,这个我之前已经测试过,参考:https://www.xmmup.com/shiyongogg-21-3yuanchengshishihuxiangtongbuoracle-11-2-0-4shuangzhu.html
然后,客户这边有个需求,需要同步Oracle 11.2.0.4的rac到单机环境,我也是按照这个过程配置的,但是配置完成后,extract进程不能抽取数据,很是奇怪,也不报错,只是“Lag at Chkpt”延迟一直在增大,执行stats exta
报错“No active extraction maps.”,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | GGSCI (ogg) 1> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING JAGENT STOPPED PMSRVR STOPPED EXTRACT RUNNING EXTA 16:22:31 00:00:06 EXTRACT RUNNING EXTB 16:09:25 00:00:05 REPLICAT RUNNING REPA 00:00:00 00:00:02 REPLICAT RUNNING REPB 00:00:00 00:00:04 GGSCI (ogg) 86> stats exta Sending STATS request to Extract group EXTA ... No active extraction maps. DDL replication statistics (for all trails): *** Total statistics since extract started *** Operations 0.00 Mapped operations 0.00 Unmapped operations 0.00 Other operations 0.00 Excluded operations 0.00 GGSCI (ogg) 48> info exta Extract EXTA Last Started 2022-11-03 09:15 Status RUNNING Checkpoint Lag 16:39:15 (updated 00:00:00 ago) Process ID 15884 Log Read Checkpoint Oracle Integrated Redo Logs 2022-11-02 16:37:03 SCN 7.3280852740 (33345623812) GGSCI (ogg) 49> info exta,showch Extract EXTA Last Started 2022-11-03 09:15 Status RUNNING Checkpoint Lag 16:39:15 (updated 00:00:08 ago) Process ID 15884 Log Read Checkpoint Oracle Integrated Redo Logs 2022-11-02 16:37:03 SCN 7.3280852740 (33345623812) Current Checkpoint Detail: Read Checkpoint #1 Oracle Integrated Redo Log Startup Checkpoint (starting position in the data source): Timestamp: 2022-11-02 16:36:25.000000 SCN: 0.0 (0) Recovery Checkpoint (position of oldest unprocessed transaction in the data source): Timestamp: 2022-11-02 16:37:03.000000 SCN: 7.3280852739 (33345623811) Current Checkpoint (position of last record read in the data source): Timestamp: 2022-11-02 16:37:03.000000 SCN: 7.3280852740 (33345623812) Write Checkpoint #1 GGS Log Trail Current Checkpoint (current write position): Sequence #: 2 RBA: 1478 Timestamp: 2022-11-03 09:16:18.549673 Extract Trail: ./dirdat/ea Seqno Length: 9 Flip Seqno Length: No Trail Type: EXTTRAIL Header: Version = 2 Record Source = A Type = 13 # Input Checkpoints = 1 # Output Checkpoints = 1 Configuration: Data Source = 3 Transaction Integrity = 1 Task Type = 0 Status: Start Time = 2022-11-03 09:15:18 Last Update Time = 2022-11-03 09:16:18 Stop Status = A Last Result = 520 GGSCI (ogg) 51> send exta showtrans Sending SHOWTRANS request to Extract group EXTA ... Extract is currently in recovery mode (reading transactions from trail file). Please try again in a few minutes. GGSCI (ogg) 66> send extract exta status Sending STATUS request to Extract group EXTA ... EXTRACT EXTA (PID 16108) Current status: In recovery[1]: Processing data Current read position: Redo thread #: 1 Sequence #: 257 RBA: 922376 Timestamp: 2022-11-02 16:37:03.000000 SCN: 7.3280852740 (33345623812) Current write position: Sequence #: 3 RBA: 1478 Timestamp: 2022-11-03 09:23:39.027185 Extract Trail: ./dirdat/ea |
另外,自己新建的环境都没有问题,可以实时同步,就客户的环境不能同步!!!
分析过程
1、用以下这些命令查询分析并没找到错误的原因
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | kill exta START EXTRACT exta BRRESET START EXTRACT exta BRINTERVAL 20M info exta,showch info exta,detail send exta showtrans send extract exta status GGSCI (ogg) 3> send exta showtrans Sending SHOWTRANS request to Extract group exta ... ------------------------------------------------------------ XID: 0.5.29.7834 Items: 0 Extract: ext8 Redo Thread: 1 Start Time: 2022-11-03:10:08:09 SCN: 7.3281664446 (33346435518) Redo Seq: 1707 Redo RBA: 20690864 Status: Running |
这里其实有个很奇怪的现象,就是send exta showtrans
查询出来的是数据库未提交的事务,但是gv$transaction
视图却查不出来结果,找不到该事务XID:
1 | SELECT * FROM gv$transaction; |
2、数据库的告警日志有如下输出:
1 2 3 | setting IGNORE_UNSUPPORTED_TABLE for table (*) GoldenGate Capture:OGG$CAP_EXT1 setting _FILTER_PARTIAL_ROLLBACK: Setting XOUT_CLIENT_EXISTS to Y for Capture: OGG$CAP_EXT1 |
怀疑是否是这个问题,于是配置:
1 2 | exec DBMS_CAPTURE_ADM.SET_PARAMETER('OGG$CAP_EXTA','IGNORE_UNSUPPORTED_TABLE','-'); exec DBMS_CAPTURE_ADM.SET_PARAMETER('OGG$CAP_EXT1','IGNORE_UNSUPPORTED_TABLE','-'); |
仍然不行。参考:https://docs.oracle.com/database/121/ARPLS/d_cap_a.htm#ARPLS306
3、考虑到之前用的docker环境可以,于是对比一下2个库的差异,发现字符集和redo大小不一样。我之前用的环境是AL32UTF8字符集,而客户环境是ZHS16GBK环境,
1 2 3 4 | export NLS_LANG=AMERICAN_AMERICA.ZHS16GBK export NLS_LANG=AMERICAN_AMERICA.AL32UTF8 setenv(NLS_LANG="AMERICAN_AMERICA.ZHS16GBK") |
仍然不行。
一般来说,字符集不一样,只会导致数据传输过来后是乱码,不会导致进程hang住。