What are Joins in Oracle


Joins

A Join is a predicate that attempts to combine 2 row sources. We only ever join two row sources together. At a given time only one Join Step is performed even though underlying row sources may have been accessed in parallel. Once two row sources have been combined the resultant row source can start to be joined to other row sources. Note that some join methods (such as nested loops) allow a batch of fetched rows to be returned to the higher levels before fetching the next batch. The join order for a query is the order in which joins are performed and makes a significant difference to the way in which the query is executed. By accessing particular row sources first, certain predicates may be filled earlier than they would be if other join orders were taken and this may prevent certain access paths from being taken.

For example, suppose there is a concatenated index on table A (a.col1,a.col2) with a.col1 as the leading column.

Consider the following query:

select A.col4
from   A,B,C
where  B.col3 = 10
and    A.col1 = B.col1
and    A.col2 = C.col2
and    C.col3 = 5

We could represent the joins present in the query using the following schematic representation: B    —   A  —    C   Predicates: col3=10                col3=5 There are really only 2 ways we can drive the query: via B.col3 or C.col3. We would have to do a Full scan of A to be able to drive off A since there are no limiting predicates which is unlikely to be efficient with large tables. If we drive off table B, using predicate B.col3=10 (as a filter or lookup key), then we will retrieve the value for B.col1 and join to A.col1. Because we have now filled the leading column of the concatenated index on table A we can use this index to give us values for A.col2 and join to A. However, if we drive off table C, then we only get a value for a.col2 and since this is a trailing column of a concatenated index and the leading column has not been supplied at this point, we cannot use the index on a to lookup the data in A. This means that it is likely that the best join order will be B A C. The CBO will obviously use costs to establish whether the individual access paths are a good idea or not. If the CBO does not choose the join order we want, then one option is to hint the desired order by changing the from clause to read: from B,A,C and using the /*+ ordered */ hint. The resultant query would be:

select /*+ ordered */ A.col4
from   B,A,C
where  B.col3 = 10
and    A.col1 = B.col1
and    A.col2 = C.col2
and    C.col3 = 5

There are 3 main join types: Nested Loops (NL), Hash Join and Sort Merge Join (SMJ).

Nested Loops: Fetches the first batch of rows from row source 1, then probes row source 2 once for each row returned from row source 1

Row source 1
~~~~~~~~~~~~
Row 1 ————–       — Probe ->       Row source 2
Row 2 ————–       — Probe ->       Row source 2
Row 3 ————–       — Probe ->       Row source 2

Row source 1 is known as the outer table
Row source 2 is known as the inner table

Accessing row source 2 is known as probing the inner table. For nested loops to be efficient it is important that the first row source returns as few rows as possible as this directly controls the number of probes of the second row source. Also it helps if the access method for row source 2 is efficient as this operation is being repeated once for every row returned by row source 1.


SQL> select /*+ ordered USE_NL(d) */ ename,dname

from emp e, dept d

where e.deptno=d.deptno

and d.dname in ('ACCOUNTING','RESEARCH','SALES','OPERATIONS');

Execution Plan

----------------------------------------------------------

Plan hash value: 196120631

----------------------------------------------------------------------------------------

| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |         |    14 |   308 |    17   (0)| 00:00:01 |

|   1 |  NESTED LOOPS                |         |       |       |            |          |

|   2 |   NESTED LOOPS               |         |    14 |   308 |    17   (0)| 00:00:01 |

|   3 |    TABLE ACCESS FULL         | EMP     |    14 |   126 |     3   (0)| 00:00:01 |

|*  4 |    INDEX UNIQUE SCAN         | PK_DEPT |     1 |       |     0   (0)| 00:00:01 |

|*  5 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    13 |     1   (0)| 00:00:01 |

----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

4 - access("E"."DEPTNO"="D"."DEPTNO")

5 - filter("D"."DNAME"='ACCOUNTING' OR "D"."DNAME"='OPERATIONS' OR

"D"."DNAME"='RESEARCH' OR "D"."DNAME"='SALES')

Hash Join Smallest row source is chosen and used to build a hash table and a bitmap. The second row source is hashed and checked against the hash table looking for joins. The bitmap is used as a quick lookup to check if rows are in the hash table and are especially useful when the hash table is too large to fit in memory.


SQL> select /*+ ordered USE_HASH(d) */ ename,dname

from emp e, dept d

where e.deptno=d.deptno

and d.dname in ('ACCOUNTING','RESEARCH','SALES','OPERATIONS');

Execution Plan

----------------------------------------------------------

Plan hash value: 2285423260

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      |    14 |   308 |     6   (0)| 00:00:01 |

|*  1 |  HASH JOIN         |      |    14 |   308 |     6   (0)| 00:00:01 |

|   2 |   TABLE ACCESS FULL| EMP  |    14 |   126 |     3   (0)| 00:00:01 |

|*  3 |   TABLE ACCESS FULL| DEPT |     4 |    52 |     3   (0)| 00:00:01 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("E"."DEPTNO"="D"."DEPTNO")

3 - filter("D"."DNAME"='ACCOUNTING' OR "D"."DNAME"='OPERATIONS' OR

"D"."DNAME"='RESEARCH' OR "D"."DNAME"='SALES')

Sort Merge Join: Rows are produced by Row Source 1 and are then sorted. Rows from Row Source 2 are then produced and sorted by the same sort key as Row Source 1. Row Source 1 and 2 are NOT accessed concurrently Sorted rows from both sides are then merged together (joined).

If the row sources are already (known to be) sorted then the sort operation is unnecessary as long as both ‘sides’ are sorted using the same key. Presorted row sources include indexed columns and row sources that have already been sorted in earlier steps. Although the merge of the 2 row sources is handled serially, the row sources could be accessed in parallel.


SQL> SELECT d.dname,e.ename

FROM   dept d,emp e

WHERE  e.deptno = d.deptno

and d.dname in ('ACCOUNTING','RESEARCH','SALES','OPERATIONS');

Execution Plan

----------------------------------------------------------

Plan hash value: 2865896559

----------------------------------------------------------------------------------------

| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |         |    14 |   308 |     5   (0)| 00:00:01 |

|   1 |  MERGE JOIN                  |         |    14 |   308 |     5   (0)| 00:00:01 |

|*  2 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     4 |    52 |     2   (0)| 00:00:01 |

|   3 |    INDEX FULL SCAN           | PK_DEPT |     4 |       |     1   (0)| 00:00:01 |

|*  4 |   SORT JOIN                  |         |    14 |   126 |     3   (0)| 00:00:01 |

|   5 |    TABLE ACCESS FULL         | EMP     |    14 |   126 |     3   (0)| 00:00:01 |

----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter("D"."DNAME"='ACCOUNTING' OR "D"."DNAME"='OPERATIONS' OR

"D"."DNAME"='RESEARCH' OR "D"."DNAME"='SALES')

4 - access("E"."DEPTNO"="D"."DEPTNO")

filter("E"."DEPTNO"="D"."DEPTNO")

Sorting is an expensive operation, especially with large tables. Because of this, SMJ is often not a particularly efficient join method.

Cartesian Product: A Cartesian Product is done where they are no join conditions between 2 row sources and there is no alternative method of accessing the data. It is not really a join as such as there is no join! Often a common coding mistake is to leave a join out of a query which will result in a cartesian product, but it can be useful in some circumstances, especially where the table on the left hand side has a very small number of rows (typically 1) meaning that the right hand side only needs to be accessed once. For example, notice that there is no join between the 2 tables in the following query:


SQL> select emp.deptno,dept.deptno

from emp,dept;

Execution Plan

----------------------------------------------------------

Plan hash value: 3359512422

--------------------------------------------------------------------------------

| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------

|   0 | SELECT STATEMENT     |         |    56 |   336 |     8   (0)| 00:00:01 |

|   1 |  MERGE JOIN CARTESIAN|         |    56 |   336 |     8   (0)| 00:00:01 |

|   2 |   INDEX FULL SCAN    | PK_DEPT |     4 |    12 |     1   (0)| 00:00:01 |

|   3 |   BUFFER SORT        |         |    14 |    42 |     7   (0)| 00:00:01 |

|   4 |    TABLE ACCESS FULL | EMP     |    14 |    42 |     2   (0)| 00:00:01 |

-------------------------------------------------------------------------------

The CARTESIAN keyword indicate that we are doing a cartesian product.