一个PG主键重复的案例
前言
昨天早上同事转了一封邮件过来,发现某张表的主键重复了!😵 印象中这是第二次遇到了,让我们一起分析下这个罕见的案例。根因很重要呀!耐心看完
分析
出问题的表结构不复杂,主要字段如下(部分信息做了脱敏)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | postgres=# \d fpp_xxx Table "public.fpp_xxx" Column | Type | Modifiers --------------------------+-----------------------------+------------------------ id_message_info | character varying(32) | not null ... ... date_created | timestamp without time zone | not null default now() updated_by | character varying(32) | not null date_updated | timestamp without time zone | not null default now() ... Indexes: "fpp_xxx_pkey" PRIMARY KEY, btree (id_message_info) ---主键 ... Foreign-key constraints: ... |
昨天开发反馈更新的时候提示主键重复,于是上去看了一下
1 2 3 4 5 6 7 8 | postgres=# select id_message_info from fpp_xxx where id_message_info='1E0443A700002A9F0000149948DA5E77'; id_message_info ---------------------------------- 1E0443A700002A9F0000149948DA5E77 1E0443A700002A9F0000149948DA5E77 (2 rows) Time: 206.009 ms |
果然主键重复了,DBA 表示惊呆了!主键的功能就是非空 + 唯一,那是哪里出了问题呢?
先看下这两条数据的事务状态,是否是可见性导致的
1 2 3 4 5 6 | postgres=# select xmin,xmax,id_message_info from fpp_xxx where id_message_info='1E0443A700002A9F0000149948DA5E77'; xmin | xmax | id_message_info ------------+------------+---------------------------------- 1837555476 | 1847923208 | 1E0443A700002A9F0000149948DA5E77 1837555517 | 0 | 1E0443A700002A9F0000149948DA5E77 (2 rows) |
第一条数据 xmax 字段有值,说明这条数据要么被更新过,要么获取过行锁,这点很好验证
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | postgres=# create table t1(id int); CREATE TABLE postgres=# insert into t1 values(1); INSERT 0 1 postgres=# insert into t1 values(2); INSERT 0 1 postgres=# select id from t1 where id = 1 for update; id ---- 1 (1 row) postgres=# select xmin,xmax from t1; ---xmax + infomask 实现行锁 xmin | xmax -------+------- 18801 | 18803 18802 | 0 (2 rows) postgres=# begin; BEGIN postgres=*# update t1 set id = 10 where id = 2; UPDATE 1 postgres=*# rollback ; ROLLBACK postgres=# select xmin,xmax from t1; xmin | xmax -------+------- 18801 | 18803 18802 | 18804 (2 rows) |
再看看 ctid
1 2 3 4 5 6 7 8 9 10 11 12 13 | postgres=# select xmin,xmax,ctid,id_message_info from fpp_xxx where id_message_info='1E0443A700002A9F0000149948DA5E77'; -[ RECORD 1 ]---+--------------------------------- xmin | 1837555476 xmax | 1847923208 ctid | (9789,8) id_message_info | 1E0443A700002A9F0000149948DA5E77 -[ RECORD 2 ]---+--------------------------------- xmin | 1837555517 xmax | 0 ctid | (9790,3) id_message_info | 1E0443A700002A9F0000149948DA5E77 Time: 2.298 ms |
根据过往的一些经验,初步怀疑可能是索引出了问题,比如 index only scan 仅在索引中取数据,索引和存储上的数据不一致,这个是有具体的案例的,可以翻下之前的文章。看下执行计划:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | postgres=# set enable_indexscan to off; SET postgres=# set enable_indexonlyscan to off; SET postgres=# set enable_bitmapscan to off; SET postgres=# explain select xmin,xmax,ctid,id_message_info from fpp_xxx where id_message_info='1E0443A700002A9F0000149948DA5E77'; QUERY PLAN -------------------------------------------------------------------------------- Seq Scan on fpp_xxx (cost=0.00..595465.29 rows=1 width=47) Filter: ((id_message_info)::text = '1E0443A700002A9F0000149948DA5E77'::text) (2 rows) postgres=# select xmin,xmax,ctid,id_message_info from fpp_xxx where id_message_info='1E0443A700002A9F0000149948DA5E77'; xmin | xmax | ctid | id_message_info ------------+------------+----------+---------------------------------- 1837555476 | 1847923208 | (9789,8) | 1E0443A700002A9F0000149948DA5E77 1837555517 | 0 | (9790,3) | 1E0443A700002A9F0000149948DA5E77 (2 rows) Time: 124054.147 ms postgres=# set enable_indexscan to on; SET postgres=# explain select xmin,xmax,ctid,id_message_info from fpp_xxx where id_message_info='1E0443A700002A9F0000149948DA5E77'; QUERY PLAN ----------------------------------------------------------------------------------------------- Index Scan using fpp_xxx_pkey on fpp_xxx (cost=0.56..5.57 rows=1 width=47) Index Cond: ((id_message_info)::text = '1E0443A700002A9F0000149948DA5E77'::text) (2 rows) postgres=# select xmin,xmax,ctid,id_message_info from fpp_xxx where id_message_info='1E0443A700002A9F0000149948DA5E77'; xmin | xmax | ctid | id_message_info ------------+------------+----------+---------------------------------- 1837555476 | 1847923208 | (9789,8) | 1E0443A700002A9F0000149948DA5E77 1837555517 | 0 | (9790,3) | 1E0443A700002A9F0000149948DA5E77 (2 rows) Time: 0.384 ms |
顺序扫描和索引扫描返回的数据一致,那不是索引的问题。不过为了确保索引是正确的状态(属实踩雷踩多了),还是使用插件 amcheck 检查了一下。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | postgres=# SELECT bt_index_check(index => c.oid, heapallindexed => i.indisunique), c.relname, c.relpages FROM pg_index i JOIN pg_opclass op ON i.indclass[0] = op.oid JOIN pg_am am ON op.opcmethod = am.oid JOIN pg_class c ON i.indexrelid = c.oid JOIN pg_namespace n ON c.relnamespace = n.oid WHERE am.amname = 'btree' AND n.nspname = 'public' -- Don't check temp tables, which may be from another session: AND c.relpersistence != 't' AND c.relname = 'fpp_xxx_pkey' -- Function may throw an error when this is omitted: AND c.relkind = 'i' AND i.indisready AND i.indisvalid ORDER BY c.relpages DESC LIMIT 10; bt_index_check | relname | relpages ----------------+-----------------------+---------- | fpp_xxx_pkey | 48302 (1 row) Time: 529784.373 ms |
em,索引也是正确的状态,说明有别于以往的案例。继续分析一下
1 2 3 4 5 6 7 8 9 10 11 | postgres=# select ctid,id_message_info from fpp_xxx where ctid='(9790,3)'; ctid | id_message_info ----------+---------------------------------- (9790,3) | 1E0443A700002A9F0000149948DA5E77 (1 row) postgres=# select ctid,id_message_info from fpp_xxx where ctid='(9789,8)'; ctid | id_message_info ----------+---------------------------------- (9789,8) | 1E0443A700002A9F0000149948DA5E77 (1 row) |
CTID 表示这两行数据分别位于 9789 和 9790 这两个数据块,使用 pageinspect 窥探一下底层文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | postgres=# select lp, t_xmin, t_xmax, t_ctid, infomask(t_infomask, 1) as infomask, infomask(t_infomask2, 2) as infomask2 from heap_page_items(get_raw_page('fpp_xxx', 9789)) where lp = 8; lp | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+------------+------------+------------+----------------------------------------------------------------------+----------------- | 8 | 1837555476 | 1847923208 | (106234,2) | XMAX_INVALID|XMIN_INVALID|XMIN_COMMITTED|HASVARWIDTH|HASNULL | (1 row) [postgres:6635@xxx] [10-25.12:38:50]=# select lp, t_xmin, t_xmax, t_ctid, infomask(t_infomask, 1) as infomask, infomask(t_infomask2, 2) as infomask2 from heap_page_items(get_raw_page('fpp_xxx', 9790)) where lp = 3; lp | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+------------+--------+-----------+----------------------------------------------------------------------+----------------- 3 | 1837555517 | 0 | (9790,3) | UPDATED|XMAX_INVALID|XMIN_INVALID|XMIN_COMMITTED|HASVARWIDTH|HASNULL | (1 row) |
这里可以看到一些差别,第一行数据的 ctid 指向了 '(106234,2)',同时表示该行已经冻结过了 XMIN_INVALID|XMIN_COMMITTED
1 2 3 4 5 6 7 8 9 | #define HEAP_XMIN_FROZEN (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID) postgres=# select lp, t_xmin, t_xmax, t_ctid, infomask(t_infomask, 1) as infomask, infomask(t_infomask2, 2) as infomask2 from heap_page_items(get_raw_page('fpp_xxx', 106234)) where lp = 2; lp | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+------------+--------+------------+----------------------------------------------------------------------+----------- 2 | 1847923208 | 0 | (106234,2) | UPDATED|XMAX_INVALID|XMIN_INVALID|HASVARWIDTH|HASNULL | (1 row) |
通过 pageinspect 查看,发现有这行数据,但是通过指定查询是没有数据的
1 2 | postgres=# select ctid,id_message_info from fpp_xxx where ctid='(106234,2)'; (0 rows) |
假如查询的时候第一条数据通过 ctid '(9789,8)' 到 ctid '(106234,2)' 这个 ctid 链进行查找的话,数据的确无法可见,也就不会有第一条数据了,主键也就不会重复了。难道是数据不一致了?ctid 链出问题了?内存中的数据和底层的数据不一致?
后面仔细一想,发现是我想多了,这个其实是正常的机制,也是我最开始被绕进去的地方 👇🏻 做个小实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | postgres=# create table t1(id int); CREATE TABLE postgres=# insert into t1 values(1); INSERT 0 1 postgres=# select xmin,ctid from t1; xmin | ctid -------+------- 18807 | (0,1) (1 row) postgres=# begin; BEGIN postgres=*# update t1 set id = 99 where id = 1; UPDATE 1 postgres=*# rollback ; ROLLBACK postgres=# select xmin,ctid from t1; --- ctid是(0,1) xmin | ctid -------+------- 18807 | (0,1) (1 row) postgres=# select lp, t_xmin, t_xmax, t_ctid, t_data, --- ctid是(0,2) infomask(t_infomask, 1) as infomask, infomask(t_infomask2, 2) as infomask2 from heap_page_items(get_raw_page('t1', 0)); lp | t_xmin | t_xmax | t_ctid | t_data | infomask | infomask2 ----+--------+--------+--------+------------+-----------------------------------+----------------- 1 | 18807 | 18808 | (0,2) | \x01000000 | XMAX_INVALID|XMIN_COMMITTED | HOT_UPDATED 2 | 18808 | 0 | (0,2) | \x63000000 | UPDATED|XMAX_INVALID|XMIN_INVALID | HEAP_ONLY_TUPLE (2 rows) |
相信各位也看明白了,第一行数据依旧在哪里,但是通过 pageinspect 看的话,ctid 均是指向的第二行数据,即使第二行数据已经回滚了,通过 MVCC 和行上的 infomask(XMIN_INVALID
) 来确保第二条数据不可见。所以,这个是正常的现象。这个不起眼的地方属实把我绕进去了,各种计算偏移量去看底层文件。