TiDB数据库OOM问题诊断及处理(查询内存消耗较大的SQL语句)
TiDB数据库OOM问题诊断及处理(查询内存消耗较大的SQL语句)
在 TiDB 数据库中运行耗内存较大的 SQL 语句,观察其对 TiDB Server 内存使用情况的影响,对其进行监控和限制。
初始化环境
使用 TiUP bench 组件构造包括 customer、lineitem、nation 等在内的基础表结构以及数据:
1 | tiup bench tpch -H 172.17.0.4 -P 4000 -D tpch prepare |
过程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | [root@lhrtidb test]# tiup bench tpch -H 172.17.0.4 -P 4000 -D tpch prepare Starting component `bench`: /root/.tiup/components/bench/v1.6.1/tiup-bench tpch -H 172.17.0.4 -P 4000 -D tpch prepare creating nation creating region creating part creating supplier creating partsupp creating customer creating orders creating lineitem generating nation table generate nation table done generating region table generate region table done generating customers table generate customers table done generating suppliers table generate suppliers table done generating part/partsupplier tables generate part/partsupplier tables done generating orders/lineitem tables generate orders/lineitem tables done Finished Found bench newer version: The latest version: v1.8.1 Local installed version: v1.6.1 Update current component: tiup update bench Update all components: tiup update --all -- 生成测试表 create table customer_test( `C_CUSTKEY` bigint(20) PRIMARY KEY AUTO_INCREMENT, `C_NAME` varchar(25) NOT NULL, `C_ADDRESS` varchar(40) NOT NULL, `C_NATIONKEY` bigint(20) NOT NULL, `C_PHONE` char(15) NOT NULL, `C_ACCTBAL` decimal(15,2) NOT NULL, `C_MKTSEGMENT` char(10) NOT NULL, `C_COMMENT` varchar(117) NOT NULL, KEY `idx_cname` (`C_NAME`) ); -- 插入数据,大概40分钟 for i in `seq 60`; do mysql -uroot -P4000 -h172.17.0.4 -e "insert into tpch.customer_test (C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) select C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT from tpch.customer"; done; |
时间较长,估计得1小时。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | MySQL [tpch]> SELECT -> CONCAT( table_schema, '.', table_name ) AS table_name, -> table_rows AS table_rows, -> CONCAT( ROUND( data_length / ( 1024 * 1024 * 1024 ), 4 ), 'G' ) AS 'Data Size', -> CONCAT( ROUND( index_length / ( 1024 * 1024 * 1024 ), 4 ), 'G' ) AS 'Index Size', -> CONCAT( ROUND( ( data_length + index_length ) / ( 1024 * 1024 * 1024 ), 4 ), 'G' ) AS 'Total' -> FROM information_schema.TABLES -> WHERE table_schema = 'tpch'; +--------------------+------------+-----------+------------+---------+ | table_name | table_rows | Data Size | Index Size | Total | +--------------------+------------+-----------+------------+---------+ | tpch.nation | 25 | 0.0000G | 0.0000G | 0.0000G | | tpch.region | 5 | 0.0000G | 0.0000G | 0.0000G | | tpch.part | 200000 | 0.0245G | 0.0000G | 0.0245G | | tpch.supplier | 10000 | 0.0014G | 0.0000G | 0.0014G | | tpch.partsupp | 800000 | 0.1174G | 0.0119G | 0.1293G | | tpch.customer | 150000 | 0.0242G | 0.0000G | 0.0242G | | tpch.orders | 1501024 | 0.1661G | 0.0000G | 0.1661G | | tpch.lineitem | 6009407 | 0.7679G | 0.0895G | 0.8575G | | tpch.customer_test | 9000000 | 1.4514G | 0.1676G | 1.6191G | +--------------------+------------+-----------+------------+---------+ 9 rows in set (0.07 sec) MySQL [tpch]> select count(*) from customer_test; +----------+ | count(*) | +----------+ | 9000000 | +----------+ 1 row in set (1.86 sec) |
这是一个商业订购系统的数据库。其中,tpch.nation 表是国家信息、tpch.region 表是地区信息、tpch.part 表是零件信息、tpch.supplier 表是供货商信息、tpch.partsupp 表是供货商的零件信息、tpch.customer 表是消费者信息、tpch.orders 表是订单信息、tpch.lineitem 表是在线商品的信息。
执行测试 SQL 语句, 观察内存消耗情况(memory列)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | MySQL [tpch]> explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY -> from customer_test t,customer s where -> t.C_NAME=s.C_NAME order by t.C_ADDRESS; +------------------------------------+-----------+---------+-----------+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+----------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +------------------------------------+-----------+---------+-----------+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+----------+ | Sort_9 | 187500.00 | 9000000 | root | | time:3m8.1s, loops:8791 | tpch.customer_test.c_address | 437.1 MB | 434.0 MB | | └─Projection_12 | 187500.00 | 9000000 | root | | time:14.2s, loops:8791, Concurrency:5 | tpch.customer_test.c_name, tpch.customer_test.c_address, tpch.customer.c_custkey | 385.8 KB | N/A | | └─IndexHashJoin_18 | 187500.00 | 9000000 | root | | time:14.3s, loops:8791, inner:{total:5m34.9s, concurrency:5, task:14, construct:1.32s, fetch:23.2s, build:320.3ms, join:5m10.4s} | inner join, inner:IndexLookUp_15, outer key:tpch.customer.c_name, inner key:tpch.customer_test.c_name, equal cond:eq(tpch.customer.c_name, tpch.customer_test.c_name) | 578.1 MB | N/A | | ├─TableReader_28(Build) | 150000.00 | 150000 | root | | time:236.9ms, loops:153, cop_task: {num: 1, max: 229.7ms, proc_keys: 150000, tot_proc: 214ms, tot_wait: 1ms, rpc_num: 1, rpc_time: 229.7ms, copr_cache_hit_ratio: 0.00} | data:TableFullScan_27 | 4.87 MB | N/A | | │ └─TableFullScan_27 | 150000.00 | 150000 | cop[tikv] | table:s | tikv_task:{time:186ms, loops:151}, scan_detail: {total_process_keys: 150000, total_process_keys_size: 30533765, total_keys: 150001, rocksdb: {delete_skipped_count: 0, key_skipped_count: 150000, block: {cache_hit_count: 503, read_count: 1, read_byte: 16.0 KB}}} | keep order:false, stats:pseudo | N/A | N/A | | └─IndexLookUp_15(Probe) | 1.25 | 9000000 | root | | time:22.5s, loops:8804, index_task: {total_time: 20.9s, fetch_handle: 12.4s, build: 1.93ms, wait: 8.4s}, table_task: {total_time: 1m1.8s, num: 491, concurrency: 70} | | 547.8 KB | N/A | | ├─IndexRangeScan_13(Build) | 1.25 | 9000000 | cop[tikv] | table:t, index:idx_cname(C_NAME) | time:8.98s, loops:8832, cop_task: {num: 23, max: 1.83s, min: 16.8ms, avg: 797.1ms, p95: 1.8s, max_proc_keys: 915840, p95_proc_keys: 775503, tot_proc: 17s, tot_wait: 69ms, rpc_num: 23, rpc_time: 18.3s, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:1.6s, min:10ms, p80:1.23s, p95:1.53s, iters:8896, tasks:23}, scan_detail: {total_process_keys: 9000000, total_process_keys_size: 657000000, total_keys: 9150009, rocksdb: {delete_skipped_count: 0, key_skipped_count: 9000000, block: {cache_hit_count: 300016, read_count: 6779, read_byte: 24.4 MB}}} | range: decided by [eq(tpch.customer_test.c_name, tpch.customer.c_name)], keep order:false, stats:pseudo | N/A | N/A | | └─TableRowIDScan_14(Probe) | 1.25 | 9000000 | cop[tikv] | table:t | time:34.8s, loops:9292, cop_task: {num: 10243, max: 411.8ms, min: 984.5?s, avg: 29.9ms, p95: 90.2ms, max_proc_keys: 1047, p95_proc_keys: 1026, tot_proc: 55.9s, tot_wait: 41.5s, rpc_num: 10243, rpc_time: 5m6.3s, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:283ms, min:0s, p80:3ms, p95:12ms, iters:56497, tasks:10243}, scan_detail: {total_process_keys: 9000000, total_process_keys_size: 1832025900, total_keys: 9029720, rocksdb: {delete_skipped_count: 0, key_skipped_count: 9000000, block: {cache_hit_count: 81793, read_count: 29769, read_byte: 744.3 MB}}} | keep order:false, stats:pseudo | N/A | N/A | +------------------------------------+-----------+---------+-----------+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+----------+ 8 rows in set (3 min 8.38 sec) |
反复执行测试用 SQL 语句,进行监控
1 | for i in `seq 10`; do mysql -uroot -P4000 -h172.17.0.4 -e "explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY from tpch.customer_test t,tpch.customer s where t.C_NAME=s.C_NAME order by t.C_ADDRESS"; done; |
通过监控 ( http://192.168.66.35:43000/ ):Grafana监控 --> TiDB --> Server --> Memory Usage 监控内存使用呈上升趋势,如下图:
找到内存占用较大的 SQL 语句
http://192.168.66.35:42382/dashboard 空密码
TiDB Dashboard SQL 语句分析
选择目标时间,以及数据库名称,可以看到 SQL list,其中可以通过『平均内存』看到集群中,执行过的 SQL 消耗的平均内存:
TiDB Dashboard 慢查询
选择目标时间,以及数据库名称,可以看到 SQL list,按照『总执行时间』排序:
点击排序结果 SQL,并查看详情,可以看到该 SQL 使用的最大内存:
通过参数限制单条 SQL 语句对于内存的使用
配置参数
修改参数 tidb_expensive_query_time_threshold ,使测试 SQL 可以被记录到 tidb.log 中, 如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | MySQL [tpch]> show variables like '%tidb_expensive_query_time_threshold%'; +-------------------------------------+-------+ | Variable_name | Value | +-------------------------------------+-------+ | tidb_expensive_query_time_threshold | 60 | +-------------------------------------+-------+ 1 row in set (0.06 sec) MySQL [tpch]> set tidb_expensive_query_time_threshold=10; Query OK, 0 rows affected (0.05 sec) MySQL [tpch]> show variables like '%tidb_expensive_query_time_threshold%'; +-------------------------------------+-------+ | Variable_name | Value | +-------------------------------------+-------+ | tidb_expensive_query_time_threshold | 10 | +-------------------------------------+-------+ 1 row in set (0.15 sec) |
修改参数 tidb_mem_quota_query ,限制测试 SQL 的内存使用为 107 M, 如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | MySQL [tpch]> show variables like '%tidb_mem_quota_query%'; +----------------------+------------+ | Variable_name | Value | +----------------------+------------+ | tidb_mem_quota_query | 1073741824 | +----------------------+------------+ 1 row in set (0.07 sec) MySQL [tpch]> set tidb_mem_quota_query=107374100; Query OK, 0 rows affected (0.05 sec) MySQL [tpch]> show variables like '%tidb_mem_quota_query%'; +----------------------+-----------+ | Variable_name | Value | +----------------------+-----------+ | tidb_mem_quota_query | 107374100 | +----------------------+-----------+ 1 row in set (0.06 sec) |
观察内存
再次执行测试 SQL 语句, 观察内存和磁盘的消耗情况:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | MySQL [tpch]> explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY -> from customer_test t,customer s where -> t.C_NAME=s.C_NAME order by t.C_ADDRESS; ERROR 1105 (HY000): Out Of Memory Quota![conn_id=80133] MySQL [tpcc]> show config where name like '%oom-action%'; +------+-----------------+------------+--------+ | Type | Instance | Name | Value | +------+-----------------+------------+--------+ | tidb | 172.17.0.4:4000 | oom-action | cancel | | tidb | 172.17.0.4:4001 | oom-action | cancel | +------+-----------------+------------+--------+ 2 rows in set (0.07 sec) MySQL [tpcc]> select * from INFORMATION_SCHEMA.CLUSTER_CONFIG d where d.key='oom-action'; +------+-----------------+------------+--------+ | TYPE | INSTANCE | KEY | VALUE | +------+-----------------+------------+--------+ | tidb | 172.17.0.4:4001 | oom-action | cancel | | tidb | 172.17.0.4:4000 | oom-action | cancel | +------+-----------------+------------+--------+ 2 rows in set (0.11 sec) |
参考:https://docs.pingcap.com/zh/tidb/stable/configure-memory-usage/
https://docs.pingcap.com/zh/tidb/stable/tidb-configuration-file/#oom-action
解决:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | -- 如果是tiup playgroud部署, [root@lhrtidb ~]# cat /tmp/tidb_config.toml oom-action = "log" tiup playground --db.config /tmp/tidb_config.toml tiup playground v5.3.0 --db 2 --pd 3 --kv 3 --host=0.0.0.0 --db.config /tmp/tidb_config.toml & -- 并且单独重启每个组件 -- 如果是tiup cluster模式 tiup cluster edit-config lhrtidb server_configs: tidb: log.slow-threshold: 300 oom-action: log -- 重载 tiup cluster reload lhrtidb |
重新测试:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | MySQL [tpch]> explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY -> from customer_test t,customer s where -> t.C_NAME=s.C_NAME order by t.C_ADDRESS; +-------------------------------+------------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +-------------------------------+------------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------+ | Sort_9 | 9107836.79 | 9000000 | root | | time:3m38.9s, loops:8791 | tpch.customer_test.c_address | 429.0 MB | 612.9 MB | | └─Projection_12 | 9107836.79 | 9000000 | root | | time:17.4s, loops:8792, Concurrency:5 | tpch.customer_test.c_name, tpch.customer_test.c_address, tpch.customer.c_custkey | 385.8 KB | N/A | | └─HashJoin_26 | 9107836.79 | 9000000 | root | | time:17.5s, loops:8792, build_hash_table:{total:1.69s, fetch:826.2ms, build:863ms}, probe:{concurrency:5, total:7m36.6s, max:1m31.3s, probe:7m4s, fetch:32.6s} | inner join, equal:[eq(tpch.customer.c_name, tpch.customer_test.c_name)] | 4.94 MB | 6.01 MB | | ├─TableReader_28(Build) | 150000.00 | 150000 | root | | time:644.6ms, loops:148, cop_task: {num: 1, max: 624.9ms, proc_keys: 150000, tot_proc: 609ms, rpc_num: 2, rpc_time: 626.3ms, copr_cache_hit_ratio: 0.00}, backoff{regionMiss: 2ms} | data:TableFullScan_27 | 4.87 MB | N/A | | │ └─TableFullScan_27 | 150000.00 | 150000 | cop[tikv] | table:s | tikv_task:{time:585ms, loops:151}, scan_detail: {total_process_keys: 150000, total_process_keys_size: 30533765, total_keys: 150001, rocksdb: {delete_skipped_count: 0, key_skipped_count: 150000, block: {cache_hit_count: 3, read_count: 499, read_byte: 11.2 MB}}} | keep order:false | N/A | N/A | | └─TableReader_30(Probe) | 9000000.00 | 9000000 | root | | time:5.35s, loops:8795, cop_task: {num: 21, max: 2.09s, min: 479.2ms, avg: 1.45s, p95: 1.92s, max_proc_keys: 448283, p95_proc_keys: 448282, tot_proc: 24.7s, tot_wait: 11ms, rpc_num: 23, rpc_time: 30.4s, copr_cache_hit_ratio: 0.00}, backoff{regionMiss: 4ms} | data:TableFullScan_29 | 328.4 MB | N/A | | └─TableFullScan_29 | 9000000.00 | 9000000 | cop[tikv] | table:t | tikv_task:{proc max:1.76s, min:305ms, p80:1.22s, p95:1.51s, iters:8878, tasks:21}, scan_detail: {total_process_keys: 9000000, total_process_keys_size: 1832025900, total_keys: 9000021, rocksdb: {delete_skipped_count: 0, key_skipped_count: 9000000, block: {cache_hit_count: 28049, read_count: 1915, read_byte: 42.8 MB}}} | keep order:false | N/A | N/A | +-------------------------------+------------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------+ 7 rows in set (3 min 39.54 sec) |
观察 TiDB Server 日志
可以通过过滤 tidb.log 中的 expensivequery 关键字,来发现内存占用高的 SQL ,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | [root@lhrtidb logs]# find / -name tidb.log /root/.tiup/data/StFki1x/tidb-0/tidb.log /root/.tiup/data/StFki1x/tidb-1/tidb.log [root@lhrtidb logs]# [root@lhrtidb logs]# more /root/.tiup/data/StFki1x/tidb-0/tidb.log | grep expensivequery [2022/01/03 19:36:42.235 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.067347434s] [conn_id=19] [user=root] [database=test1] [txn_start_ts=430233286638829569] [mem_max="0 Bytes (0 Bytes)"] [sql="analyze table test1.t1"] [2022/01/04 16:22:51.335 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.007100467s] [cop_time=0.530594416s] [process_time=0.326s] [wait_time=0.021s] [request_count=1] [total_keys=150001] [process_keys=150000] [num_cop_tasks=1] [process_avg_time=0.326s] [process_p90_time=0.326s] [process_max_time=0.326s] [process_max_addr=172.17.0.4:20161] [wait_avg_time=0.021s] [wait_p90_time=0.021s] [wait_max_time=0.021s] [wait_max_addr=172.17.0.4:20161] [stats=customer:pseudo] [conn_id=80227] [user=root] [table_ids="[104]"] [txn_start_ts=430252886935994370] [mem_max="72927744 Bytes (69.5 MB)"] [sql="insert into tpch.customer_test (C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) select C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT from tpch.customer"] [2022/01/04 16:24:33.235 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.04302791s] [cop_time=0.529589494s] [process_time=0.443s] [wait_time=0.001s] [request_count=1] [total_keys=150001] [process_keys=150000] [num_cop_tasks=1] [process_avg_time=0.443s] [process_p90_time=0.443s] [process_max_time=0.443s] [process_max_addr=172.17.0.4:20161] [wait_avg_time=0.001s] [wait_p90_time=0.001s] [wait_max_time=0.001s] [wait_max_addr=172.17.0.4:20161] [stats=customer:pseudo] [conn_id=80229] [user=root] [table_ids="[104]"] [txn_start_ts=430252913635360770] [mem_max="72927745 Bytes (69.5 MB)"] [sql="insert into tpch.customer_test (C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) select C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT from tpch.customer"] [2022/01/04 16:26:43.335 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.07968566s] [cop_time=0.509808334s] [process_time=0.431s] [request_count=1] [total_keys=150001] [process_keys=150000] [num_cop_tasks=1] [process_avg_time=0.431s] [process_p90_time=0.431s] [process_max_time=0.431s] [process_max_addr=172.17.0.4:20161] [wait_avg_time=0s] [wait_p90_time=0s] [wait_max_time=0s] [wait_max_addr=172.17.0.4:20161] [stats=customer:pseudo] [conn_id=80233] [user=root] [table_ids="[104]"] [txn_start_ts=430252947727187971] [mem_max="72927745 Bytes (69.5 MB)"] [sql="insert into tpch.customer_test (C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) select C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT from tpch.customer"] [2022/01/04 16:27:50.335 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.079355314s] [cop_time=0.730926102s] [process_time=0.63s] [wait_time=0.003s] [request_count=1] [total_keys=150001] [process_keys=150000] [num_cop_tasks=1] [process_avg_time=0.63s] [process_p90_time=0.63s] [process_max_time=0.63s] [process_max_addr=172.17.0.4:20161] [wait_avg_time=0.003s] [wait_p90_time=0.003s] [wait_max_time=0.003s] [wait_max_addr=172.17.0.4:20161] [stats=customer:pseudo] [conn_id=80235] [user=root] [table_ids="[104]"] [txn_start_ts=430252965293195268] [mem_max="72927745 Bytes (69.5 MB)"] [sql="insert into tpch.customer_test (C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) select C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT from tpch.customer"] [2022/01/04 16:28:59.534 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.033467054s] [cop_time=0.525532259s] [process_time=0.439s] [wait_time=0.001s] [request_count=1] [total_keys=150001] [process_keys=150000] [num_cop_tasks=1] [process_avg_time=0.439s] [process_p90_time=0.439s] [process_max_time=0.439s] [process_max_addr=172.17.0.4:20161] [wait_avg_time=0.001s] [wait_p90_time=0.001s] [wait_max_time=0.001s] [wait_max_addr=172.17.0.4:20161] [stats=customer:pseudo] [conn_id=80237] [user=root] [table_ids="[104]"] [txn_start_ts=430252983444570114] [mem_max="72927745 Bytes (69.5 MB)"] [sql="insert into tpch.customer_test (C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) select C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT from tpch.customer"] [2022/01/04 16:30:13.435 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.029348576s] [cop_time=0.530950074s] [process_time=0.442s] [request_count=1] [total_keys=150001] [process_keys=150000] [num_cop_tasks=1] [process_avg_time=0.442s] [process_p90_time=0.442s] [process_max_time=0.442s] [process_max_addr=172.17.0.4:20161] [wait_avg_time=0s] [wait_p90_time=0s] [wait_max_time=0s] [wait_max_addr=172.17.0.4:20161] [stats=customer:pseudo] [conn_id=80239] [user=root] [table_ids="[104]"] [txn_start_ts=430253002816749570] [mem_max="72927745 Bytes (69.5 MB)"] [sql="insert into tpch.customer_test (C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) select C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT from tpch.customer"] [2022/01/04 16:31:19.635 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.017770701s] [cop_time=0.505416523s] [process_time=0.418s] [request_count=1] [total_keys=150001] [process_keys=150000] [num_cop_tasks=1] [process_avg_time=0.418s] [process_p90_time=0.418s] [process_max_time=0.418s] [process_max_addr=172.17.0.4:20161] [wait_avg_time=0s] [wait_p90_time=0s] [wait_max_time=0s] [wait_max_addr=172.17.0.4:20161] [stats=customer:pseudo] [conn_id=80241] [user=root] [table_ids="[104]"] [txn_start_ts=430253020170682370] [mem_max="72927745 Bytes (69.5 MB)"] [sql="insert into tpch.customer_test (C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) select C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT from tpch.customer"] [2022/01/04 16:32:53.735 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.061057548s] [cop_time=0.648490114s] [process_time=0.556s] [wait_time=0.001s] [request_count=1] [total_keys=150001] [process_keys=150000] [num_cop_tasks=1] [process_avg_time=0.556s] [process_p90_time=0.556s] [process_max_time=0.556s] [process_max_addr=172.17.0.4:20161] [wait_avg_time=0.001s] [wait_p90_time=0.001s] [wait_max_time=0.001s] [wait_max_addr=172.17.0.4:20161] [stats=customer:pseudo] [conn_id=80243] [user=root] [table_ids="[104]"] [txn_start_ts=430253044838694913] [mem_max="72927745 Bytes (69.5 MB)"] [sql="insert into tpch.customer_test (C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) select C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT from tpch.customer"] [2022/01/04 16:35:42.035 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.074126405s] [cop_time=0.531395197s] [process_time=0.441s] [request_count=1] [total_keys=150001] [process_keys=150000] [num_cop_tasks=1] [process_avg_time=0.441s] [process_p90_time=0.441s] [process_max_time=0.441s] [process_max_addr=172.17.0.4:20161] [wait_avg_time=0s] [wait_p90_time=0s] [wait_max_time=0s] [wait_max_addr=172.17.0.4:20161] [stats=customer:pseudo] [conn_id=80251] [user=root] [table_ids="[104]"] [txn_start_ts=430253088944422918] [mem_max="72927745 Bytes (69.5 MB)"] [sql="insert into tpch.customer_test (C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) select C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT from tpch.customer"] [2022/01/04 16:54:35.638 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.020857561s] [cop_time=34.708165865s] [process_time=73.14s] [wait_time=41.547s] [request_count=10267] [total_keys=18329730] [process_keys=18150000] [num_cop_tasks=10267] [process_avg_time=0.007123794s] [process_p90_time=0.012s] [process_max_time=1.638s] [process_max_addr=172.17.0.4:20160] [wait_avg_time=0.004046654s] [wait_p90_time=0.011s] [wait_max_time=0.207s] [wait_max_addr=172.17.0.4:20161] [stats=customer:pseudo,customer_test:pseudo] [conn_id=80133] [user=root] [database=tpch] [table_ids="[104]"] [txn_start_ts=430253386123706371] [mem_max="1080000818 Bytes (1.01 GB)"] [sql="explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY\nfrom customer_test t,customer s where\nt.C_NAME=s.C_NAME order by t.C_ADDRESS"] [2022/01/04 16:58:25.439 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.011205032s] [cop_time=24.6284611s] [process_time=40.836s] [wait_time=23.238s] [request_count=10268] [total_keys=16483496] [process_keys=16309659] [num_cop_tasks=10268] [process_avg_time=0.003977015s] [process_p90_time=0.005s] [process_max_time=1.51s] [process_max_addr=172.17.0.4:20160] [wait_avg_time=0.002263147s] [wait_p90_time=0.006s] [wait_max_time=0.074s] [wait_max_addr=172.17.0.4:20161] [stats=customer:pseudo,customer_test:pseudo] [conn_id=80327] [user=root] [table_ids="[104]"] [txn_start_ts=430253446377504770] [mem_max="1074141986 Bytes (1.00 GB)"] [sql="explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY from tpch.customer_test t,tpch.customer s where t.C_NAME=s.C_NAME order by t.C_ADDRESS"] [2022/01/04 17:01:21.740 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.074068409s] [cop_time=59.959508759s] [process_time=76.953s] [wait_time=20.216s] [request_count=10267] [total_keys=14982900] [process_keys=14813171] [num_cop_tasks=10267] [process_avg_time=0.007495178s] [process_p90_time=0.002s] [process_max_time=6.329s] [process_max_addr=172.17.0.4:20160] [wait_avg_time=0.001969026s] [wait_p90_time=0.003s] [wait_max_time=0.283s] [wait_max_addr=172.17.0.4:20161] [stats=customer_test:pseudo,customer:pseudo] [conn_id=80329] [user=root] [table_ids="[104]"] [txn_start_ts=430253492576976899] [mem_max="1074140930 Bytes (1.00 GB)"] [sql="explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY from tpch.customer_test t,tpch.customer s where t.C_NAME=s.C_NAME order by t.C_ADDRESS"] [2022/01/04 17:04:26.091 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.108780273s] [cop_time=33.117686798s] [process_time=36.509s] [wait_time=14.215s] [request_count=10267] [total_keys=14937227] [process_keys=14767583] [num_cop_tasks=10267] [process_avg_time=0.003555955s] [process_p90_time=0.002s] [process_max_time=2.135s] [process_max_addr=172.17.0.4:20160] [wait_avg_time=0.001384532s] [wait_p90_time=0.003s] [wait_max_time=0.207s] [wait_max_addr=172.17.0.4:20160] [stats=customer:pseudo,customer_test:pseudo] [conn_id=80361] [user=root] [table_ids="[104]"] [txn_start_ts=430253540893523971] [mem_max="1098481698 Bytes (1.02 GB)"] [sql="explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY from tpch.customer_test t,tpch.customer s where t.C_NAME=s.C_NAME order by t.C_ADDRESS"] [2022/01/04 17:07:34.139 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.041437453s] [cop_time=24.579118418s] [process_time=29.597s] [wait_time=8.294s] [request_count=10268] [total_keys=14388299] [process_keys=14220317] [num_cop_tasks=10268] [process_avg_time=0.00288245s] [process_p90_time=0.002s] [process_max_time=2.295s] [process_max_addr=172.17.0.4:20160] [wait_avg_time=0.000807752s] [wait_p90_time=0.002s] [wait_max_time=0.051s] [wait_max_addr=172.17.0.4:20162] [stats=customer_test:pseudo,customer:pseudo] [conn_id=80371] [user=root] [table_ids="[104]"] [txn_start_ts=430253590202810371] [mem_max="1074123666 Bytes (1.00 GB)"] [sql="explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY from tpch.customer_test t,tpch.customer s where t.C_NAME=s.C_NAME order by t.C_ADDRESS"] [2022/01/04 17:10:34.339 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.009265494s] [cop_time=42.360651378s] [process_time=44.893s] [wait_time=23.147s] [backoff_time=0.032s] [request_count=10268] [total_keys=15225395] [process_keys=15054811] [num_cop_tasks=10268] [process_avg_time=0.004372126s] [process_p90_time=0.003s] [process_max_time=2.329s] [process_max_addr=172.17.0.4:20162] [wait_avg_time=0.002254285s] [wait_p90_time=0.004s] [wait_max_time=0.148s] [wait_max_addr=172.17.0.4:20160] [stats=customer:pseudo,customer_test:pseudo] [conn_id=80377] [user=root] [table_ids="[104]"] [txn_start_ts=430253637454528514] [mem_max="1074125138 Bytes (1.00 GB)"] [sql="explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY from tpch.customer_test t,tpch.customer s where t.C_NAME=s.C_NAME order by t.C_ADDRESS"] [2022/01/04 17:13:38.039 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=60.071924804s] [cop_time=22.907768048s] [process_time=27.502s] [wait_time=7.992s] [request_count=10267] [total_keys=14075962] [process_keys=13908970] [num_cop_tasks=10267] [process_avg_time=0.002678679s] [process_p90_time=0.002s] [process_max_time=1.749s] [process_max_addr=172.17.0.4:20160] [wait_avg_time=0.000778416s] [wait_p90_time=0.002s] [wait_max_time=0.259s] [wait_max_addr=172.17.0.4:20160] [stats=customer:pseudo,customer_test:pseudo] [conn_id=80379] [user=root] [table_ids="[104]"] [txn_start_ts=430253685583904772] [mem_max="1074109866 Bytes (1.00 GB)"] [sql="explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY from tpch.customer_test t,tpch.customer s where t.C_NAME=s.C_NAME order by t.C_ADDRESS"] [2022/01/04 17:13:42.592 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=26.171815194s] [cop_time=69.240729669s] [process_time=71.565s] [wait_time=19.813s] [request_count=7685] [total_keys=10049516] [process_keys=9927992] [num_cop_tasks=7685] [process_avg_time=0.009312296s] [process_p90_time=0.002s] [process_max_time=7.05s] [process_max_addr=172.17.0.4:20160] [wait_avg_time=0.002578139s] [wait_p90_time=0.002s] [wait_max_time=0.289s] [wait_max_addr=172.17.0.4:20161] [stats=customer_test:pseudo,customer:pseudo] [conn_id=80133] [user=root] [database=tpch] [table_ids="[104]"] [txn_start_ts=430253695663341569] [mem_max="562430508 Bytes (536.4 MB)"] [sql="explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY from\ncustomer_test t,customer s where\nt.C_NAME=s.C_NAME order by t.C_ADDRESS"] [2022/01/04 17:14:54.340 +08:00] [WARN] [expensivequery.go:179] [expensive_query] [cost_time=15.954373002s] [cop_time=29.565295375s] [process_time=29.562s] [wait_time=72.565s] [backoff_time=0.04s] [request_count=7685] [total_keys=9836868] [process_keys=9715992] [num_cop_tasks=7685] [process_avg_time=0.003846714s] [process_p90_time=0.002s] [process_max_time=3.221s] [process_max_addr=172.17.0.4:20162] [wait_avg_time=0.00944242s] [wait_p90_time=0.002s] [wait_max_time=0.306s] [wait_max_addr=172.17.0.4:20160] [stats=customer:pseudo,customer_test:pseudo] [conn_id=80133] [user=root] [database=tpch] [table_ids="[104]"] [txn_start_ts=430253717159149573] [mem_max="549897856 Bytes (524.4 MB)"] [sql="explain analyze select t.C_NAME,t.C_ADDRESS,s.C_CUSTKEY from\ncustomer_test t,customer s where\nt.C_NAME=s.C_NAME order by t.C_ADDRESS"] |
可以通过过滤 tidb.log 中的 memory exceeds quota, spill to disk now 关键字,来发现 SQL 执行过程中的一部分中间结果使用了磁盘作为临时存储,如下:
1 2 3 4 5 6 7 8 9 10 | [root@lhrtidb logs]# more /root/.tiup/data/StFki1x/tidb-0/tidb.log | grep 'memory exceeds quota, spill to disk now' [2022/01/04 16:53:47.732 +08:00] [INFO] [row_container.go:561] ["memory exceeds quota, spill to disk now."] [consumed=1078857434] [quota=1073741824] [2022/01/04 16:57:32.712 +08:00] [INFO] [row_container.go:561] ["memory exceeds quota, spill to disk now."] [consumed=1073761618] [quota=1073741824] [2022/01/04 17:00:39.409 +08:00] [INFO] [row_container.go:561] ["memory exceeds quota, spill to disk now."] [consumed=1073760562] [quota=1073741824] [2022/01/04 17:03:37.169 +08:00] [INFO] [row_container.go:561] ["memory exceeds quota, spill to disk now."] [consumed=1097936994] [quota=1073741824] [2022/01/04 17:06:42.815 +08:00] [INFO] [row_container.go:561] ["memory exceeds quota, spill to disk now."] [consumed=1073743298] [quota=1073741824] [2022/01/04 17:09:46.713 +08:00] [INFO] [row_container.go:561] ["memory exceeds quota, spill to disk now."] [consumed=1073744770] [quota=1073741824] [2022/01/04 17:12:46.808 +08:00] [INFO] [row_container.go:561] ["memory exceeds quota, spill to disk now."] [consumed=1073806874] [quota=1073741824] [2022/01/04 17:13:32.326 +08:00] [INFO] [row_container.go:561] ["memory exceeds quota, spill to disk now."] [consumed=561740256] [quota=107374100] [2022/01/04 17:14:45.189 +08:00] [INFO] [row_container.go:561] ["memory exceeds quota, spill to disk now."] [consumed=502721437] [quota=107374100] |
最后,我们观察下 Grafana 监控 --> TiDB --> Server --> Memory Usage 内存使用情况: