Friday, December 7, 2012

Which step completes first? How to read an explain plan.

When reading an explain plan (or execution plan) in Oracle one of the first things you should figure out is which step completes first.  As a general rule this step is best if it's something like a Primary Key index scan.  The goal with a SQL plan is to  "Start Small and Stay Small".  Hence with a PK type look up you will tend to have a plan that starts with the fewest rows.  So let's take a look at a plan and figure out which step is first to complete.  Here is a query, and its explain plan:


select * from emp, dept

where emp.deptno = dept.deptno and sal > 1000 order by empno

SQL> @hxplan
Enter .sql file name (without extension): emp_dept2
Enter the display level (TYPICAL, ALL, BASIC, SERIAL) [TYPICAL]  :
Plan hash value: 77426881

-------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |    12 |   720 |     6  (34)| 00:00:01 |
|   1 |  SORT ORDER BY                 |                |    12 |   720 |     6  (34)| 00:00:01 |
|   2 |   MERGE JOIN                   |                |    12 |   720 |     5  (20)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID | DEPT           |     4 |    80 |     2   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN            | DEPT_DEPTNO_PK |     4 |       |     1   (0)| 00:00:01 |
|*  5 |    SORT JOIN                   |                |    12 |   480 |     3  (34)| 00:00:01 |
|*  6 |     TABLE ACCESS BY INDEX ROWID| EMP            |    12 |   480 |     2   (0)| 00:00:01 |
|*  7 |      INDEX FULL SCAN           | EMP_DEPT_IDX   |    13 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
       filter("EMP"."DEPTNO"="DEPT"."DEPTNO")
   6 - filter("SAL">1000)
   7 - filter("EMP"."DEPTNO" IS NOT NULL)
SQL>



Some folks have been taught that a plan completes "bottom up".  So they would think that the first step to complete is step 7. But it isn't. Order of execution is Top - Down, but order of completion is a sort of "inside out bottom up". 

From an exection point of view, the first thing this query wants to do is a SORT ORDER BY (Step 1), but it can't do that yet.  It needs rows to sort.  So then it wants to do a MERGE JOIN (2), but to do that it would need two sets of rows to merge together. Then it wants to do a TABLE ACCESS BY INDEX ROWID (3), but to to that it needs ROWIDs from an index.  Next is the INDEX FULL SCAN (4).  This means that it is step number 4 that is the first step to complete.  This index scan will return the rows to the table scan so the rows can be retrieved.

Now this is important to understand.  The index scan and the table scan steps are interactive. The index scan gets one ROWID, passes it to the table scan which reads the row, then the index scan gets the next ROWID and so on.  This means the INDEX FULL SCAN (4) is the first to complete but the TABLE ACCESS BY INDEX ROWID (3) finishes very soon after it.

After step 3 completes, the SORT JOIN (5) step would want to go, this would be were the second set of rows would be sorted.  In this case the second set of rows doesn't need to be sorted.  Why?  Because the INDEX FULL SCAN (7) returns the ROWIDS in DETPNO order to the TABLE SCAN BY INDEX ROWID (6). Since the join is on DEPTNO, the sort would have been down on DEPTNO, but the index is on DEPTNO so the rows are already in the order needed for the MERGE JOIN (2).

BTW - The SORT JOIN step is always there for the second row source in a Sort Merge Join, even if the data isn't sorted, like here.  This is because this temp segment is where the second row source is merged with the first row source.  If the first row source needed to be sorted you would see the SORT JOIN step as the last step for that set of rows as well.

Since in this plan both sets of rows come via and index full scan no data is sorted in this plan. The query was able to leverage the sorted data in both the indexes.

Once steps 7, 6 and 5 complete, there are now two sets of sorted data and the MERGE JOIN (2) can complete and finally the SORT ORDER BY (1) can complete.  This sort is by EMPNO (the ORDER BY in the query).  Now the sorted data is passed back to the user (step 0).

So order of competition is:  4-3-7-6-5-2-1-0.

This plan doesn't do a PK look up as it's first step, but in this case the INDEX FULL SCAN is quite good.  The query is joining all the rows from each table so this type of index scan is appropriate for this join.

As an additional question where did this predicate being applied on step 7 come from?

filter("EMP"."DEPTNO" IS NOT NULL)

Wednesday, November 28, 2012

Why a double Nested Loop?

An interesting optimization has been done with Nested Loops in Oracle recently. In 11 in you may see plans like this:

SQL> set autotrace traceonly explain
SQL> select /*+ use_nl (d e) */ e.empno, e.sal, d.deptno, d.dname

  2  from emp e, dept d
  3  where e.deptno = d.deptno ;

Execution Plan
----------------------------------------------------------
Plan hash value: 1688427947

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |    13 |   312 |     6   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |    13 |   312 |     6   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL         | DEPT         |     4 |    52 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | EMP_DEPT_IDX |     4 |       |     0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| EMP          |     3 |    33 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("E"."DEPTNO"="D"."DEPTNO")
       filter("E"."DEPTNO" IS NOT NULL)


Why the double Nested Loop?  We certainly expect to see the number of joins to be one less then the total number of tables, so with only 2 tables we'd expect to see just 1 join not 2.

What's happening here is that Oracle is "carrying" less in the inner loop.  It's joining the DEPT table just to index on EMP.  The results from this inner join is a row from the DEPT table and the ROWID from the EMP table, much less stuff to work with in the inner loop.  Joining the row of DEPT with just the index on EMP is certainly less then working with the entire row from both. 

The outer loop then really isn't a nested loop at all, but rather a "lookup" into the EMP table to find the row using the ROWID that it  retrieved from the inner loop. Now it has all the data, the row from DEPT and the row from EMP joined together.

Kinda cool eh?

Monday, November 19, 2012

Which trace file is mine?

A common issue when tracing is finding your trace file after the tracing is done.  Back in the bad old days I would do something like "select 'Ric Van Dyke' from dual;" in the session I was tracing.  Once I was finished I'd then do a search for a trace file with "Ric Van Dyke" in the file to find mine.  It worked, but today we have a much more elegant way to find a trace file:


SQL> alter session set tracefile_identifier='MY_TRACE';

SQL> exec dbms_monitor.session_trace_enable(null,null,true,true,'ALL_EXECUTIONS');

SQL> select 'hello' from dual;
'HELL
-----
hello

SQL> exit

Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

C:\app\hotsos\diag\rdbms\hotsos\hotsos\trace>dir *MY_TRACE*
11/15/2012  04:30 PM            10,220 hotsos_ora_5540_MY_TRACE.trc
11/15/2012  04:30 PM                94 hotsos_ora_5540_MY_TRACE.trm
               2 File(s)         10,314 bytes

You can make this more elaborate for example this is an anomomys PL/SQL block I have used in the past to turn on tracing, which dynamicly sets TRACEFILE_IDENTIFIER to the current time and user:

declare
  t_trc_file varchar2(256):= 
    'alter session set tracefile_identifier='||chr(39)||
    to_char(sysdate, 'hh24miss’)||'_'||user||chr(39);
  t_trc_stat varchar2(256):=
     'alter session set timed_statistics=true';
  t_trc_size varchar2(256):=
    'alter session set max_dump_file_size=unlimited';
  t_trc_sql  varchar2(256):=
    'alter session set events '||chr(39)||
    '10046 trace name context forever, level 12'||chr(39);
begin
    execute immediate t_trc_file;  -- Set tracefile identifier
    execute immediate t_trc_stat;  -- Turn on timed statistics
    execute immediate t_trc_size;  -- Set max Dump file size
    execute immediate t_trc_sql;   -- Turn on Trace
end;
 

Friday, November 9, 2012

Is this index being used?



When teaching the Optimizing Oracle SQL, Intensive class I’m often asked how to tell if an index is being used or not.  Oracle does have a feature for this called (oddly enough) index monitoring.  However it’s a less than ideal.  First off it’s just a YES or NO switch.  Either the index has been used since you turned it on or it hasn’t.  Also it’s not exactly 100% accurate.   Sometimes an index will be used and not counted and other times the way it’s used isn’t really what you want, for example collecting stats will typically be counted as “using” the index.  However this is not the type of use most folks are interested in.  We’d like to know when it’s used as part of a query.

We can see this.  With a rather simple query you can easily see which indexes are in use and how many times, and even how the index is being used.  Something like this does the trick; of course you’d want to change the owner name.  

The "#SQL" column is the number of different SQL statements currently in the library cache that have used the index and the "#EXE" column shows the total number of executions.  

select count(*) "#SQL", sum(executions) "#EXE",
object_name, options
from v$sql_plan_statistics_all
where operation = 'INDEX' and
object_owner = 'OP'
group by object_name, options
order by object_name, options
/

An example run:

SQL> @index_usage

  #SQL   #EXE OBJECT_NAME              OPTIONS
------ ------ ------------------------ ------------------------
     4      4 BIG_OBJTYPE_IDX          RANGE SCAN
     2      4 EMP_EMPNO_PK             FULL SCAN
     1      3 EMP_UPPER_IDX            RANGE SCAN
     1      2 LIC_OWN_CTY_LICNO_IDX    SKIP SCAN
     1      3 SALESORDER_PK            FAST FULL SCAN
     1      2 SALESORDER_PK            FULL SCAN (MIN/MAX)
     1      1 SALESORDER_PK            RANGE SCAN DESCENDING
     1      3 SALESORDER_PK            UNIQUE SCAN
     2      2 USERNAME_PK              FULL SCAN
     2      2 USERNAME_PK              UNIQUE SCAN
SQL>




From this simple output we can see that the BIG_OBJTYPE_IDX is used in four different statements that each ran just once.  And that the SALEORDER_PK is used a variety of ways.  Each one from the same statement but some run more times then others.

A couple of other things to keep in mind, this will only show what’s happened “recently”.  Being a V$ view the data isn’t kept in here for any particular length of time.   In a production environment you might want to run this a few times in the day to capture what’s going on.  Also this view is fairly expensive to select from, so doing this once in a while is fine, just don’t set this up to poll the view every second for the day.   It will likely show us as a “top SQL” statement in every monitoring tool.

Friday, July 20, 2012

Does the block size really matter?


A classic debate over the years in Oracle Land is that of block size.  There are two concepts in this debate, smaller block size for transactional systems and larger for data ware house systems.  


The idea that smaller is good for a transactional system is that with fewer rows per block there is a better ratio of interested transaction list (ITL) entries in the block header to rows.  Since an ITL is needed to lock one or more rows in the block and a transactional system would generally be doing many single (or very few) row DML actions concurrently, this ratio of few rows to ITLs is good.  This means that with just a few ITLs in any given block there should be enough to lock any amount of rows in a block.   If there were a lot of rows per block and many folks were trying to lock different rows in the same block at the same time, then it’s possible the block might run out of ITLs and hence folks have to wait more. 


For a data ware house things are generally opposite.  Folks tend to do lots of full table scans and there is little (if any) concurrent DML.  So having many rows per ITL isn’t as big a deal.  Some systems only have one process that does some sort of scheduled DML operations.  Since there is just the one process running, having just one ITL per block is fine.  


This argument is valid, and is worth considering in your design.  Using it as the sole reason for picking a block size is rather limited.  It will be difficult to see if this ratio of rows to ITLs is causing really issues. The really issue that is foremost on most folks minds is performance, as in, does the block size make a difference in how fast a query will run?


The basic argument here is twofold.  First, for index scans, the argument is that the larger the block size the shorter the index (its height, BLEVEL) hence the faster the index.   Second, for full table scans, the larger the block the faster since there are more rows per block hence fewer blocks to read. 
Let’s take a look at some numbers to see how these arguments stack up.  Granted this is a simple example, but if we can see a performance gain or not in a simple example that’s a pretty good indicator of what will happen in a complex one. First the index question.


Our test data is a table in an 8K block size tablespace, 9,254,528 rows, 131,840 blocks and about 1G in size.  We will do an index range scan on this table in three tests.  The block size of the tablespace for the indexes will be 4K, 8K and 16K.  The index stats for each are:
4K – Levels 4, Leaf blocks 197,987
8K – Levels 3, Leaf blocks 95,873 (reduction of 52%)
16K – Levels 2, Leaf blocks 47,177 (reduction of 51%, and 76% from the 4K)


The run time statistics of each of the index scans:
   4K Block      8K Block         16K Block
Consistent Gets     14,017        13,904            13,846
Buffer Pinned Ct    12,065        12,065            12,065
Time*               22,172        22,297            22,165
(* The time is a representative time from multiple runs in micro-seconds)


Notice that the performance hardly was affected by the block size change.   Almost no matter which way you look at the performance it wasn’t significantly changed by the block size.   The Buffer Pinned Count stayed the same which makes perfect sense, this was the number of rows retrieved from the table and those same ROWIDs would be retrieved from the index regardless of the block size. 


So, what can one say about a full table scan?  Here I recreated the table in a tablespace of 4K, 8K and 16K with no indexes.  


The table had 9,254,784 rows in all three incarnations.  The number of blocks in each table was:
4k - 269,824
8K - 131,840  (51% reduction from 4K)
16K - 65,216  (50% reduction from 8K, 75% reduction from 4K)


The size of the table in Megs didn’t change a lot, 1,054M for 4K, 1,030M for 8K and 1,019M for 16K.  About a 2% drop each time with just over 3% drop from 4K to 16K. 


The run time statistics of a full table scan for each block size:
4K Block       8K Block          16K Block
Consistent Gets    534,811        261,354            121,494
Time*           11,590,478     15,411,618         14,280,009
(* The time is a representative time from multiple runs in micro-seconds)


Not surprising the LIOs (consistent gets) dropped off right in line with the smaller number of blocks for the table.  This is rather nice and likely good for contention in particular.  However it is interesting to note that the time went up as the block size got larger.  It seems the reasonable explanation for this is that although it’s reading fewer blocks, each block is larger and hence over all takes longer to read. 


So what’s the bottom line?  It seems rather doubtful that a larger block size will have much (if any) impact on performance.  It does seem that it could reduce the overall size of objects (tables and indexes) which in itself is a good thing.  However don’t expect the performance of you application to change significantly just because you have a larger block size.