PG做update操作引起的表膨胀情况
Tags: PGPostgreSQLUPDATE膨胀表膨胀
前言
PostgreSQL数据库的MVCC机制中,一个明显的特点是,在执行DELETE操作,它一般并不立即擦除要删的数据,通常只是做一个标记,留给VACUUM进程去做相应的清理,并且是有条件的清理。
而UPDATE操作,则更是一个DELETE操作和一个INSERT操作的组合。INSERT操作则通常会APPEND一条记录在后边。这样的结果是,通常情况下,这种INSERT, UPDATE操作应该是很快的。
但如果用的不当,也有一些反例。
示例
下面看一则很普通的示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | create table t3 (id int); insert into t3 values(1); begin; \set VERBOSE verbose do language plpgsql $$ declare tcount int = 100000; begin for i in 1..tcount loop update t3 set id = id+1; end loop; end; $$; commit; DO Time: 120762.618 ms (02:00.763) mydb=*# commit; |
我们可以明显的看到,一条普通的记录,在更新100000次的操作下,居然花了整整两分钟。
我们用下边的SQL来看看该表的元组分布情况:(将其中的public.t3改为你想要的表名,就可以查目标表的相应情况)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | WITH cteTableInfo AS ( SELECT COUNT(1) AS ct ,SUM(length(t::text)) AS TextLength ,'public.t3'::regclass AS TableName FROM public.t3 AS t ) ,cteRowSize AS ( SELECT ARRAY [pg_relation_size(TableName) , pg_relation_size(TableName, 'vm') , pg_relation_size(TableName, 'fsm') , pg_table_size(TableName) , pg_indexes_size(TableName) , pg_total_relation_size(TableName) , TextLength ] AS val , ARRAY ['Relation Size' , 'Visibility Map' , 'Free Space Map' , 'Table Included Toast Size' , 'Indexes Size' , 'Total Relation Size' , 'Live Row Byte Size' ] AS Name FROM cteTableInfo ) SELECT unnest(name) AS Description ,unnest(val) AS Bytes ,pg_size_pretty(unnest(val)) AS BytesPretty ,unnest(val) / ct AS bytes_per_row FROM cteTableInfo, cteRowSize UNION ALL SELECT '------------------------------', NULL, NULL, NULL UNION ALL SELECT 'TotalRows', ct, NULL, NULL FROM cteTableInfo UNION ALL SELECT 'LiveTuples', pg_stat_get_live_tuples(TableName), NULL, NULL FROM cteTableInfo UNION ALL SELECT 'DeadTuples', pg_stat_get_dead_tuples(TableName), NULL, NULL FROM cteTableInfo; |
结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | description | bytes | bytespretty | bytes_per_row --------------------------------+---------+-------------+--------------- Relation Size | 3629056 | 3544 kB | 3629056 Visibility Map | 0 | 0 bytes | 0 Free Space Map | 24576 | 24 kB | 24576 Table Included Toast Size | 3653632 | 3568 kB | 3653632 Indexes Size | 0 | 0 bytes | 0 Total Relation Size | 3653632 | 3568 kB | 3653632 Live Row Byte Size | 8 | 8 bytes | 8 ------------------------------ | | | TotalRows | 1 | | LiveTuples | 1 | | DeadTuples | 100000 | | (11 rows) mydb=# |
发现总共只有一条记录,但是死元组居然是100000条。这些都是要占有空间的。总的表大小是3658kB,它可是只有一个整型字段啊。
其主要原因就是它一个事务里头,调用了太多的update,并且是不断的DELETE, INSERT, 在后一次的UPDATE,它要全部扫描过往的元组。
执行一次vacuum,看看:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | mydb=# vacuum verbose t123; INFO: vacuuming "public.t123" INFO: table "t123": found 0 removable, 12 nonremovable row versions in 1 out of 55 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 744 Skipped 0 pages due to buffer pins, 0 frozen pages. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM description | bytes | bytespretty | bytes_per_row --------------------------------+---------+-------------+--------------- Relation Size | 3629056 | 3544 kB | 3629056 Visibility Map | 8192 | 8192 bytes | 8192 Free Space Map | 24576 | 24 kB | 24576 Table Included Toast Size | 3661824 | 3576 kB | 3661824 Indexes Size | 0 | 0 bytes | 0 Total Relation Size | 3661824 | 3576 kB | 3661824 Live Row Byte Size | 8 | 8 bytes | 8 ------------------------------ | | | TotalRows | 1 | | LiveTuples | 1 | | DeadTuples | 0 | | (11 rows) |
也只是将死元组清除标记,空间上并没有立即回收。使用vacuum full,会立即回收整个空间。
如果想让上边的update速度能快一些,可以改为在循环内显式的提交事务。但似乎也改善的不太多。但是空间占用会少很多。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | create table t4 (id int); insert into t4 values(1); \timing on \set VERBOSE verbose do language plpgsql $$ declare tcount int = 100000; begin for i in 1..tcount loop update t4 set id = id+1; commit; end loop; end; $$; 结果: Time: 32059.662 ms (00:32.060) mydb=# |
空间大小:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | description | bytes | bytespretty | bytes_per_row --------------------------------+-------+-------------+--------------- Relation Size | 8192 | 8192 bytes | 8192 Visibility Map | 8192 | 8192 bytes | 8192 Free Space Map | 24576 | 24 kB | 24576 Table Included Toast Size | 40960 | 40 kB | 40960 Indexes Size | 0 | 0 bytes | 0 Total Relation Size | 40960 | 40 kB | 40960 Live Row Byte Size | 8 | 8 bytes | 8 ------------------------------ | | | TotalRows | 1 | | LiveTuples | 1 | | DeadTuples | 0 | | (11 rows) Time: 11.155 ms |
可以看到,它的大小也是有所不同的。证明在中间update操作是可以复用以前的无用空间。
再补充一个小的验证性测试,如果我们将fillfactor调整为20, 预留80%的空间给update操作,看看第1种情况下,花的时间是啥样的?(1分40秒)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | create table t3 (id int) with (fillfactor=20); insert into t3 values(1); \timing on \set VERBOSE verbose begin; do language plpgsql $$ declare tcount int = 100000; begin for i in 1..tcount loop update t3 set id = id+1; end loop; end; $$; commit; DO Time: 100961.301 ms (01:40.961) |
表相关空间大小:3576kB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | description | bytes | bytespretty | bytes_per_row --------------------------------+---------+-------------+--------------- Relation Size | 3629056 | 3544 kB | 3629056 Visibility Map | 8192 | 8192 bytes | 8192 Free Space Map | 24576 | 24 kB | 24576 Table Included Toast Size | 3661824 | 3576 kB | 3661824 Indexes Size | 0 | 0 bytes | 0 Total Relation Size | 3661824 | 3576 kB | 3661824 Live Row Byte Size | 8 | 8 bytes | 8 ------------------------------ | | | TotalRows | 1 | | LiveTuples | 1 | | DeadTuples | 0 | | (11 rows) |
循环内commit,情况如何:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | create table t3 (id int) with (fillfactor=20); insert into t3 values(1); \timing on \set VERBOSE verbose do language plpgsql $$ declare tcount int = 100000; begin for i in 1..tcount loop update t3 set id = id+1; commit; end loop; end; $$; 结果: DO Time: 32512.754 ms (00:32.513) |
空间大小:(8kB)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | description | bytes | bytespretty | bytes_per_row --------------------------------+-------+-------------+--------------- Relation Size | 8192 | 8192 bytes | 8192 Visibility Map | 0 | 0 bytes | 0 Free Space Map | 0 | 0 bytes | 0 Table Included Toast Size | 8192 | 8192 bytes | 8192 Indexes Size | 0 | 0 bytes | 0 Total Relation Size | 8192 | 8192 bytes | 8192 Live Row Byte Size | 8 | 8 bytes | 8 ------------------------------ | | | TotalRows | 1 | | LiveTuples | 1 | | DeadTuples | 34 | | (11 rows) |
在实际使用当中,对于高频的更新操作的表,为减轻vacuum的负担,可以建议性的将fillfactor调整为80或75,给Update操作预留适当的空间。