Oracle Process Architecture Internals


To understand the Oracle Architecture in detail, we need to have deep understanding of the Memory and Process architecture. I have already covered Oracle Memory Architecture here. In this post I would cover the basic Oracle Internal processes and their interprocess communication.

NOTE:- This posts assumes that you are aware of Oracle Memory Architecture.

Arch.PNG

Broadly Oracle Process structure is classified as:-

User Process, Server Process and Background Process. We would cover every process individually.

  1. User Process:- A database user whi needs to request information from the database must first make connection with the Oracle Server. the connection is requested using a database interface tool, such as SQL*Plus, and beginning the user process. The user process does not interact directly with the Oracle Server. Rather ut generates calls through the user program interface (UPI), which creates a session and start a server process.user.PNG
  2. Server Process:- Once a user has established a connection, a server process is started to handle the user processes requests. A server process can be either a dedicated server process or  a shared server process. In a dedicated server environment, the server process handles the request of a single user process. Once a a user process disconnects, the server process is terminated. In a shared server env., the server process handles the request of several user processes. The server process communicates with the Oracle server using the Oracle Program Interface (OPI).
    dedi

    Dedicated Server Connection

    shared

    Shared Server Connection

  3. Background Process:- There are two classes of background processes: those that have a focused job to do and those that do a variety of other jobs (i.e., utility processes).
    bg.PNG

    Focussed Background Processes

    Let’s now take a look at the function performed by each major process of interest, starting with the primary Oracle background processes.

  • PMON: The Process Monitor

This process is responsible for cleaning up after abnormally terminated connections. For example, if your dedicated server “fails” or is killed for some reason, PMON is the process responsible for fixing (recovering or undoing work) and releasing your resources. PMON will initiate the rollback of uncommitted work, release locks, and free SGA resources allocated to the failed process.

In addition to cleaning up after aborted connections, PMON is responsible for monitoring the other Oracle background processes and restarting them if necessary (and if possible). If a shared server or a dispatcher fails (crashes), PMON will step in and restart another one (after cleaning up for the failed process). PMON will watch all of the Oracle processes and either restart them or terminate the instance as appropriate. For example, it is appropriate to fail the instance in the event the database log writer process, LGWR, fails. This is a serious error, and the safest path of action is to terminate the instance immediately and let normal recovery fix the data.

The other thing PMON does for the instance is to register it with the Oracle TNS listener. When an instance starts up, the PMON process polls the well-known port address, unless directed otherwise, to see whether or not a listener is up and running. The well-known/default port used by Oracle is 1521. Now, what happens if the listener is started on some different port? In this case, the mechanism is the same, except that the listener address needs to be explicitly specified by the LOCAL_LISTENER parameter setting. If the listener is running when the database instance is started, PMON communicates with the listener and passes to it relevant parameters, such as the service name and load metrics of the instance. If the listener was not started, PMON will periodically attempt to contact it to register itself.

  • SMON: The System Monitor

SMON is the process that gets to do all of the system-level jobs. Whereas PMON was interested in individual processes, SMON takes a system-level perspective of things and is a sort of garbage collector for the database. Some of the jobs it does include the following:

  • Cleans up temporary space: With the advent of true temporary tablespaces, the chore of cleaning up temporary space has lessened, but it has not gone away. For example, when building an index, the extents allocated for the index during the creation are marked as TEMPORARY. If the CREATE INDEX session is aborted for some reason, SMON is responsible for cleaning them up. Other operations create temporary extents that SMON would be responsible for as well.
  • Coalesces free space: If you are using dictionary-managed tablespaces, SMON is responsible for taking extents that are free in a tablespace and contiguous with respect to each other and coalescing them into one larger free extent. This occurs only on dictionary-managed tablespaces with a default storage clause that has pctincrease set to a nonzero value.
  • Recovers transactions active against unavailable files: This is similar to its role during database startup. Here, SMON recovers failed transactions that were skipped during instance/crash recovery due to a file(s) not being available to recover. For example, the file may have been on a disk that was unavailable or not mounted. When the file does become available, SMON will recover it.
  • Performs instance recovery of a failed node in RAC: In an Oracle RAC configuration, when a database instance in the cluster fails (e.g., the machine the instance was executing on fails), some other node in the cluster will open that failed instance’s redo log files and perform a recovery of all data for that failed instance.
  • Cleans up OBJ$: OBJ$ is a low-level data dictionary table that contains an entry for almost every object (table, index, trigger, view, and so on) in the database. Many times, there are entries in here that represent deleted objects, or objects that represent “not there” objects, used in Oracle’s dependency mechanism. SMON is the process that removes these rows that are no longer needed.
  • Shrinks undo segments: SMON will perform the automatic shrinking of a rollback segment to its optimal size, if it is set.
  • Offlines rollback segments: It is possible for the DBA to offline, or make unavailable, a rollback segment that has active transactions. It may be possible that active transactions are using this offlined rollback segment. In this case, the rollback is not really offlined; it is marked as “pending offline.” In the background, SMON will periodically try to truly take it offline, until it succeeds.

PMON does many other things, such as flush the monitoring statistics that show up in the DBA_TAB_MONITORING view, flush the SCN to timestamp mapping information found in the SMON_SCN_TIME table, and so on. The SMON process can accumulate quite a lot of CPU over time, and this should be considered normal. SMON periodically wakes up (or is woken up by the other background processes) to perform these housekeeping chores.

  • CKPT: Checkpoint Process

The checkpoint process doesn’t, as its name implies, do a checkpoint, and then that’s mostly the job of DBWn (Database Writer). It simply assists with the checkpointing process by updating the file headers of the data files. The job of updating data files’ headers with checkpoint information
used to belong to the LGWR; however, as the number of files increased along with the size of a database over time, this additional task for LGWR became too much of a burden. If LGWR had to update dozens, or hundreds, or even thousands, of files, there would be a good chance sessions waiting to commit these transactions would have to wait far too long. CKPT removes this responsibility from LGWR.

  • DBWn: Database Block Writer

The database block writer (DBWn) is the background process responsible for writing dirty blocks to disk. DBWn will write dirty blocks from the buffer cache, usually to make more room in the cache (to free buffers for reads of other data) or to advance a checkpoint (to move forward the position in an online redo log file from which Oracle would have to start reading, to recover the instance in the event of failure). Oracle needs to advance the checkpoint so that it no longer needs the online redo log file it just filled up. If it hasn’t been able to do that by the time we need to reuse that redo log file, we get the “checkpoint not complete” message and we must wait.

As you can see, the performance of DBWn can be crucial. If it does not write out blocks fast enough to free buffers (buffers that can be reused to cache some other blocks) for us, we will see both the number and duration of waits on Free Buffer Waits and Write Complete Waits start to grow. We can configure more than one DBWn; in fact, we can configure up to 36 (DBW0 . . . DBW9, DBWa . . . DBWz). Most systems run with one database block writer, but larger, multi-CPU systems can make use of more than one. This is generally done to distribute the workload of keeping a large block buffer cache in the SGA clean, flushing the dirtied (modified) blocks to disk.

Optimally, the DBWn uses asynchronous I/O to write blocks to disk. With asynchronous I/O, DBWn gathers up a batch of blocks to be written and gives them to the operating system. DBWn does not wait for the operating system to actually write the blocks out; rather, it goes back and collects the next batch to be written. As the operating system completes the writes, it asynchronously notifies DBWn that it completed the writes. This allows DBWn to work much faster than if it had to do everything serially.

  • LGWR: Log Writer

The LGWR process is responsible for flushing to disk the contents of the redo log buffer located in the SGA. It does this when one of the following is true:

Every three seconds
• Whenever a commit is issued by any transaction
• When the redo log buffer is one-third full or contains 1MB of buffered data 

For these reasons, having an enormous (hundreds/thousands of megabytes) redo log buffer is not practical; Oracle will never be able to use it all since it pretty much continuously flushes it. The logs are written to with sequential writes as compared to the scattered I/O DBWn must perform. Doing large batch writes like this is much more efficient than doing many scattered writes to various parts of a file. This is one of the main reasons for having a LGWR and redo logs in the first place. The efficiency in just writing out the changed bytes using sequential I/O outweighs the additional I/O incurred. Oracle could just write database blocks directly to disk when you commit, but that would entail a lot of scattered I/O of full blocks, and this would be significantly slower than letting LGWR write the changes out sequentially.

  • ARCn: Archive Process

The job of the ARCn process is to copy an online redo log file to another location when LGWR fills it up. These archived redo log files can then be used to perform media recovery. Whereas online redo log is used to fix the data files in the event of a power failure (when the instance is terminated), archived redo logs are used to fix data files in the event of a hard disk failure. If we lose the disk drive containing the data file, /d01/oradata/ora11g/system.dbf, we can go to our backups from last week, restore that old copy of the file, and ask the database to apply all of the archived and online redo logs generated since that backup took place. This will catch up that file with the rest of the data files in our database, and we
can continue processing with no loss of data.

There are many other Background processes like RECO, DIAG, FBDA, DBRM, GENO, RBAL, ASMB etc, Please comment below if you need details of any other background process.