Oracle数据泵expdp和impdp之parallel参数
Tags: expdpimpdpOracleparallel加速并行数据泵逻辑导出导入
简介
EXPDP命令的PARALLEL和FILESIZE参数联合使用在“某些情况下”可以提高备份的效率。
这种方法可以完成并行的逻辑备份。
expdp
EXPDP命令的PARALLEL和FILESIZE参数联合使用在“某些情况下”可以提高备份的效率。这种方法可以完成并行的逻辑备份。
PARALLEL工作原理:控制执行任务的最大线程数。当指定参数值为1时,表示最多只启动一个线程处理数据,设置为3时,表示最多启动三个线程并行处理数据,由于同一时间一个DUMP文件只允许一个线程处理,所以输出的文件若只有一个,即使PARALLEL=10,也只有1个线程进行数据输出,其他9个空闲。所以PARALLEL需要与FILESIZE参数一起使用,指定每个DUMP文件的大小。
在expdp中,parallel参数值应小于等于dump文件数。如果prarallel值设置过高,则有可能报ORA-39095 error。ORA-39095: Dump file space has been exhausted: Unable to allocate 8192 bytes
对于导出来说,由于dump文件只能由一个线程进行操作(包括I/O处理),因此如果输出的DUMP文件只有一个,即使你指定再多的并行,实际工作仍然是一个,而且还会触发ORA-39095错误。因此,建议设置该参数小于或等于生成的DUMP文件数量。那么,如何控制生成的DUMP文件数量呢?
EXPDP 命令提供了一个FILESIZE参数,用来指定单个DUMP文件的最大容量,要有效的利用parallel参数,filesize参数必不可少。
举 例:某用户对象占用了4G左右的空间,实际导出后的DUMP文件约为3G,我们尝试在导出该用户时指定并行度为4,设置单个文件不超过500M,则语法如下:
1 | $ expdp user/pwd directory=dump_file dumpfile=expdp_20100820_%U.dmp logfile=expdp_20100820.log filesize=500M parallel=4 |
impdp
IMPDP有所不同,会先启动一个WOKER进程进行METADATA导入,然后启动多个WORKER进程导入,所以在前期只会看到1个WOKER在导入METADATA,而且IMPDP如果PARALLE=4也需要>=4个DMP文件,也可以使用%U来进行导入。而对于impdp来说,对单个文件添加parallel也可以实现并行导入。
Parallel在impdp中的参数与expdp中有所不同。
摘抄官方文档一句话:If the source of the import is a dump file set consisting of files, then multiple processes can read from the same file, but performance may be limited by I/O contention.
该意为:在impdp中,parallel参数可以在一个文件或者多个文件中均可以使用,但是parallel参数不是越大越好,同样会受到I/O的限制。
对于导入来说,使用parallel参数则要简单的多,导入更能体现parallel参数的优势。 参数设置为几,则认为同时将几张表的内容导入到库中。
举例:某dmp文件中包含了200张表,我们尝试在导入该DMP文件时指定并行度为10,则语法如下:
1 | impdp user/pwd directory=dump_file dumpfile=expdp_20100820.dmp logfile=impdp_20100820.log parallel=10 |
查询CPU个数: cat /proc/cpuinfo |grep "processor"|wc -l
rac
在11gR2 rac后,EXPDP 和 IMPDP的WORKER进程会在多个INSTANCE启动,所以DIRECTORY必须在共享磁盘上,如果没有设置共享磁盘,那么必须指定CLUSTER=N
参数来防止报错。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ORA-31617: unable to open dump file “/u01/dump_dir/wms20180919_03.dmp” for write ORA-19505: failed to identify file “/u01/dump_dir/wms20180919_03.dmp” ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory Additional information: 3 ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O Additional information: 3 Additional information: 128 ORA-31693: Table data object "LHR1"."FORM" failed to load/unload and is being skipped due to error: ORA-31617: unable to open dump file "/home/oracle/bk/orcl_12.dmp" for write ORA-19505: failed to identify file "/home/oracle/bk/orcl_12.dmp" ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory Additional information: 3 |
导出和导入都需要加入CLUSTER=N
参数。
导出导入示例
1 2 3 4 5 6 7 8 9 10 11 | expdp \'/ AS SYSDBA\' directory=D1 dumpfile=orcl_%U.dmp EXCLUDE=STATISTICS LOGFILE=expdp_orcl_20220804.log \ SCHEMAS=LHR1,LHR2 CLUSTER=N COMPRESSION=ALL parallel=12 FILESIZE=20g expdp \'/ AS SYSDBA\' attach=SYS_EXPORT_FULL_01 expdp \'/ AS SYSDBA\' attach=SYS_EXPORT_SCHEMA_01 expdp \'/ AS SYSDBA\' attach=SYS_EXPORT_TABLE_01 impdp \'/ AS SYSDBA\' directory=D1 dumpfile=orcl_%U.dmp EXCLUDE=STATISTICS LOGFILE=impdp_orcl_20220804.log \ FULL=Y CLUSTER=N parallel=12 table_exists_action=REPLACE |
重要描述
For Data Pump Export, the value that is specified for the parallel parameter should be less than or equal to the number of files in the dump file set. Each worker or Parallel Execution Process requires exclusive access to the dump file, so having fewer dump files than the degree of parallelism will mean that some workers or PX processes will be unable to write the information they are exporting. If this occurs, the worker processes go into an idle state and will not be doing any work until more files are added to the job. See the explanation of the DUMPFILE parameter in the Database Utilities guide for details on how to specify multiple dump files for a Data Pump export job.
For Data Pump Import, the workers and PX processes can all read from the same files. However, if there are not enough dump files, the performance may not be optimal because multiple threads of execution will be trying to access the same dump file. The performance impact of multiple processes sharing the dump files depends on the I/O subsystem containing the dump files. For this reason, Data Pump Import should not have a value for the PARALLEL parameter that is significantly larger than the number of files in the dump file set.
In a typical export that includes both data and metadata, the first worker process will unload the metadata: tablespaces, schemas, grants, roles, tables, indexes, and so on. This single worker unloads the metadata, and all the rest unload the data, all at the same time. If the metadata worker finishes and there are still data objects to unload, it will start unloading the data too. The examples in this document assume that there is always one worker busy unloading metadata while the rest of the workers are busy unloading table data objects.
If the external tables method is chosen, Data Pump will determine the maximum number of PX processes that can work on a table data object. It does this by dividing the estimated size of the table data object by 250 MB and rounding the result down. If the result is zero or one, then PX processes are not used to unload the table
The PARALLEL parameter works a bit differently in Import than Export. Because there are various dependencies that exist when creating objects during import, everything must be done in order. For Import, no data loading can occur until the tables are created because data cannot be loaded into tables that do not yet exist
总结
1、数据泵的expdp使用PARALLEL
需要与FILESIZE
、dumpfile=orcl_%U.dmp
参数一起使用才能真正实现并发;而对于impdp来说,对单个文件添加parallel也可以实现并行导入,而无论该文件是否是并行导出的文件。
2、EXPDP会使用一个WORKER进程导出METADATA
3、若是rac环境,且导出或导入文件存放在本地,那么需要添加CLUSTER=N
参数,注意:导出和导入都需要添加该参数。
4、如果导出文件数量少于并发数时,多于并发将不会工作。
5、是否真实使用了并行,我们可以通过如下命令进入会话,输入status
命令,查看多个进程的State状态即可:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | expdp \'/ AS SYSDBA\' attach=SYS_EXPORT_FULL_01 expdp \'/ AS SYSDBA\' attach=SYS_EXPORT_SCHEMA_01 expdp \'/ AS SYSDBA\' attach=SYS_EXPORT_TABLE_01 impdp \'/ AS SYSDBA\' attach=SYS_EXPORT_FULL_01 impdp \'/ AS SYSDBA\' attach=SYS_EXPORT_SCHEMA_01 impdp \'/ AS SYSDBA\' attach=SYS_EXPORT_TABLE_01 [oracle@cwrac1 bk]$ impdp lhr/lhr attach=SYS_IMPORT_SCHEMA_01 Import: Release 11.2.0.4.0 - Production on Sun Aug 7 15:04:19 2022 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options Job: SYS_IMPORT_SCHEMA_01 Owner: LHR Operation: IMPORT Creator Privs: TRUE GUID: E5A2263B24F440E7E0530100007F17B6 Start Time: Sunday, 07 August, 2022 14:52:58 Mode: SCHEMA Remote link: dbl_lhr Instance: cworcl1 Max Parallelism: 12 EXPORT Job Parameters: IMPORT Job Parameters: Parameter Name Parameter Value: CLIENT_COMMAND lhr/******** DIRECTORY=DATA_PUMP_DIR schemas=XYFHCS network_link=dbl_lhr EXCLUDE=STATISTICS PARALLEL=12 table_exists_action=REPLACE REMAP_SCHEMA=XYFHCS:LHR TABLE_EXISTS_ACTION REPLACE State: EXECUTING Bytes Processed: 0 Current Parallelism: 12 Job Error Count: 0 Worker 1 Status: Process Name: DW00 State: WORK WAITING Worker 2 Status: Process Name: DW00 State: WORK WAITING Worker 3 Status: Process Name: DW02 State: WORK WAITING Worker 4 Status: Process Name: DW01 State: WORK WAITING Worker 5 Status: Process Name: DW04 State: WORK WAITING Worker 6 Status: Process Name: DW02 State: WORK WAITING Worker 7 Status: Process Name: DW06 State: WORK WAITING Worker 8 Status: Process Name: DW03 State: WORK WAITING Worker 9 Status: Process Name: DW08 State: EXECUTING Object Schema: LHR Object Name: TB_CONFIGDATA_SLAVERY Object Type: SCHEMA_EXPORT/TABLE/TABLE_DATA Completed Objects: 1 Total Objects: 1,695 Worker Parallelism: 1 Worker 10 Status: Process Name: DW04 State: WORK WAITING Worker 11 Status: Process Name: DW0A State: WORK WAITING Worker 12 Status: Process Name: DW05 State: WORK WAITING |
该导入其实只有1个进程在运行。
6、impdp和network_link、PARALLEL配合,可以直接并行导出,但是,我通过实验看到单个schem似乎并没有并行
1 2 3 4 5 6 7 8 9 | -- 目标端创建dblink create public database link dbl_lhr connect to system identified by oracle using '(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.23.94 )(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = orcl) ) )'; -- 在目标端直接导入 nohup impdp \'/ AS SYSDBA\' DIRECTORY=DATA_PUMP_DIR FULL=Y \ LOGFILE=impdp_netlink_lhr_20210416.log network_link=dbl_lhr EXCLUDE=STATISTICS PARALLEL=12 table_exists_action=REPLACE & |
network_link可以参考:https://www.xmmup.com/dbbao59-kongjianbuzuzenmebanbiepashujubengzhinetwork_linklaibang.html