Oracle datapump Architecture & Internals

Oracle Datapump was introduced with 10gR1 as a replacement to the classic export/import utilities. In this post we’ll go through the internals of Datapump to see how it works.

Oracle added two new packages DBMS_DATAPUMP & DBMS_METADATA which forms the base for the datapump export import operations. So either you use the command line functionality or via OEM you basically work with DBMS_DATAPUMP & DBMS_METADATA packages.

Datapump architecture:-

Master Table

The most important part of the architecture is the master table, it is a directory that records and maintains the current state & location of every object being exported or imported. Master table also maintains the details about status of all the sub-processes (worker process) forked from the master control process.

Master table is created in the schema of the user running the datapump job during the start of the execution. In an export job the information in this master table is written to the dumpfile as a last step whereas in an import job this information is loaded to the master table as a first step so that master table can be used to sequence the creation of all the objects. In both the case once the job is successfully completed this master table is dropped from the schema.

Following is the master table structure created during an schema export job:-


SQL&amp;amp;gt; desc "ADMIN"."SYS_EXPORT_SCHEMA_01"
 Name                                        Null?             Type
 ----------------------------------------- -------- ----------------------------
 PROCESS_ORDER                                                NUMBER
 DUPLICATE                                                    NUMBER
 DUMP_FILEID                                                  NUMBER
 DUMP_POSITION                                                NUMBER
 DUMP_LENGTH                                                  NUMBER
 DUMP_ORIG_LENGTH                                             NUMBER
 DUMP_ALLOCATION                                              NUMBER
 COMPLETED_ROWS                                               NUMBER
 ERROR_COUNT                                                  NUMBER
 ELAPSED_TIME                                                 NUMBER
 OBJECT_TYPE_PATH                                          VARCHAR2(200)
 OBJECT_PATH_SEQNO                                            NUMBER
 OBJECT_TYPE                                               VARCHAR2(30)
 IN_PROGRESS                                                  CHAR(1)
 OBJECT_NAME                                               VARCHAR2(500)
 OBJECT_LONG_NAME                                          VARCHAR2(4000)
 OBJECT_SCHEMA                                             VARCHAR2(30)
 ORIGINAL_OBJECT_SCHEMA                                    VARCHAR2(30)
 ORIGINAL_OBJECT_NAME                                      VARCHAR2(4000)
 PARTITION_NAME                                            VARCHAR2(30)
 SUBPARTITION_NAME                                         VARCHAR2(30)
 DATAOBJ_NUM                                                  NUMBER
 FLAGS                                                        NUMBER
 PROPERTY                                                     NUMBER
 TRIGFLAG                                                     NUMBER
 CREATION_LEVEL                                               NUMBER
 COMPLETION_TIME                                               DATE
 OBJECT_TABLESPACE                                         VARCHAR2(30)
 SIZE_ESTIMATE                                                NUMBER
 OBJECT_ROW                                                   NUMBER
 PROCESSING_STATE                                             CHAR(1)
 PROCESSING_STATUS                                            CHAR(1)
 BASE_PROCESS_ORDER                                           NUMBER
 BASE_OBJECT_TYPE                                          VARCHAR2(30)
 BASE_OBJECT_NAME                                          VARCHAR2(30)
 BASE_OBJECT_SCHEMA                                        VARCHAR2(30)
 ANCESTOR_PROCESS_ORDER                                       NUMBER
 DOMAIN_PROCESS_ORDER                                         NUMBER
 PARALLELIZATION                                              NUMBER
 UNLOAD_METHOD                                                NUMBER
 LOAD_METHOD                                                  NUMBER
 GRANULES                                                     NUMBER
 SCN                                                          NUMBER
 GRANTOR                                                   VARCHAR2(30)
 XML_CLOB                                                      CLOB
 PARENT_PROCESS_ORDER                                         NUMBER
 NAME                                                      VARCHAR2(30)
 VALUE_T                                                   VARCHAR2(4000)
 VALUE_N                                                      NUMBER
 IS_DEFAULT                                                   NUMBER
 FILE_TYPE                                                    NUMBER
 USER_DIRECTORY                                            VARCHAR2(4000)
 USER_FILE_NAME                                            VARCHAR2(4000)
 FILE_NAME                                                 VARCHAR2(4000)
 EXTEND_SIZE                                                  NUMBER
 FILE_MAX_SIZE                                                NUMBER
 PROCESS_NAME                                              VARCHAR2(30)
 LAST_UPDATE                                                   DATE
 WORK_ITEM                                                 VARCHAR2(30)
 OBJECT_NUMBER                                                NUMBER
 COMPLETED_BYTES                                              NUMBER
 TOTAL_BYTES                                                  NUMBER
 METADATA_IO                                                  NUMBER
 DATA_IO                                                      NUMBER
 CUMULATIVE_TIME                                              NUMBER
 PACKET_NUMBER                                                NUMBER
 INSTANCE_ID                                                  NUMBER
 OLD_VALUE                                                 VARCHAR2(4000)
 SEED                                                         NUMBER
 LAST_FILE                                                    NUMBER
 USER_NAME                                                 VARCHAR2(30)
 OPERATION                                                 VARCHAR2(30)
 JOB_MODE                                                  VARCHAR2(30)
 QUEUE_TABNUM                                                 NUMBER
 CONTROL_QUEUE                                             VARCHAR2(30)
 STATUS_QUEUE                                              VARCHAR2(30)
 REMOTE_LINK                                               VARCHAR2(4000)
 VERSION                                                      NUMBER
 JOB_VERSION                                               VARCHAR2(30)
 DB_VERSION                                                VARCHAR2(30)
 TIMEZONE                                                  VARCHAR2(64)
 STATE                                                     VARCHAR2(30)
 PHASE                                                        NUMBER
 GUID                                                         RAW(16)
 START_TIME                                                    DATE
 BLOCK_SIZE                                                   NUMBER
 METADATA_BUFFER_SIZE                                         NUMBER
 DATA_BUFFER_SIZE                                             NUMBER
 DEGREE                                                       NUMBER
 PLATFORM                                                  VARCHAR2(101)
 ABORT_STEP                                                   NUMBER
 INSTANCE                                                  VARCHAR2(60)
 CLUSTER_OK                                                   NUMBER
 SERVICE_NAME                                              VARCHAR2(100)
 OBJECT_INT_OID                                            VARCHAR2(32)

SQL&amp;amp;gt;

Client Process

The expdp, impdp, OEM Interface and Custom Interface are the client process provided by oracle by which we can use the datapump APIs.

Shadow Process

This process is created when Client logs in to database and processes the the datapump APIs requests (DBMS_DATAPUMP.OPEN). It is responsible for creating jobs to further create the master table and master control process.

Master Control Process

Master control process (MCP) which is created by shadow process upon START_JOB request from Datapump API is the master process that controls the execution and sequencing for the datapump job with the information fetched from master table. MCP further forks several worker process according to the value of PARALLEL parameter and dispatch work regarding unloading and loading of data. MCP has a process name as <Instance Name>_DMnn_<pid>.

Worker process

The job of worker process is to unload/load the data and metadata from/to the database and also to update the master table. So as the objects are unloaded/loaded the status (PENDING, COMPLETED, FAILED etc.) along with other information (No. of rows, size, object type etc.) is updated in the master table. Worker process has a process name as <Instance Name>_DWnn_<pid>.

Parallel Query process

For further parallel processing specially in the case where table partitioning is involved, even the worker process creates some parallel query (PQ) processes and further assigns the unloading/loading assignment to these PQ processes and act as a query coordinator.

Command and control queue

All the processes related to Datapump jobs described above have subscription of this queue. Command and control queue is used for interprocess communication between all the datapump processes to coordinate all the API commands, loading/unloading requests & responses, file requests, logging etc. This queue is named as KUPC$C_<job-unique>_<timestamp>.

Status queue

The shadow process subscribes to read from this queue and is used to receive error messages from MCP which is the only process that can write to this queue. This queue is named as KUPC$S_<job-unique>_<timestamp>.

This was all about Datapump architecture that you need to know, now its time to see the methods used by Oracle for Data movement.

Data Movement:-

Data Pump has 3 options for moving the data out/in from/to the database, and based on several performance related factors & user specified arguments, MCP chooses the fastest option for that particular operation.

The available options are:-

Direct Path Loads and Unloads.
External Tables.
Insert as Select over a network

Unless explicitly specified by the user MCP chooses the best option based on the factors like the type used in table definition, whether or not REMAP or NETWORK_LINK clauses are used, total size of data to be exported/imported, PARALLEL clause etc.

For example if there is only 1 large table that is to be exported, MCP will choose external tables with max. parallelization so that worker process could use parallel query slaves to move the data. On the contrary if the table size would have been small then using external tables would not be beneficial compared to direct path unload.

That is enough for today (at least for me), in the next post on datapump internals, I’ll show how these processes use the master table information to attach/detach, stop, restart, kill, resume datapump jobs.

Until then enjoy the internals and try some brainstorming with the strace of a expdp which you can find here.

Oracle Database Internal Mechanism

Oracle datapump Architecture & Internals

3 thoughts on “Oracle datapump Architecture & Internals”

Leave a comment Cancel reply

Share this:

Related posts

3 thoughts on “Oracle datapump Architecture & Internals”

Leave a comment Cancel reply