Wednesday, November 28, 2012

Why a double Nested Loop?

An interesting optimization has been done with Nested Loops in Oracle recently. In 11 in you may see plans like this:

SQL> set autotrace traceonly explain
SQL> select /*+ use_nl (d e) */ e.empno, e.sal, d.deptno, d.dname

  2  from emp e, dept d
  3  where e.deptno = d.deptno ;

Execution Plan
Plan hash value: 1688427947

| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT             |              |    13 |   312 |     6   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |    13 |   312 |     6   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL         | DEPT         |     4 |    52 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | EMP_DEPT_IDX |     4 |       |     0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| EMP          |     3 |    33 |     1   (0)| 00:00:01 |

Predicate Information (identified by operation id):

   4 - access("E"."DEPTNO"="D"."DEPTNO")
       filter("E"."DEPTNO" IS NOT NULL)

Why the double Nested Loop?  We certainly expect to see the number of joins to be one less then the total number of tables, so with only 2 tables we'd expect to see just 1 join not 2.

What's happening here is that Oracle is "carrying" less in the inner loop.  It's joining the DEPT table just to index on EMP.  The results from this inner join is a row from the DEPT table and the ROWID from the EMP table, much less stuff to work with in the inner loop.  Joining the row of DEPT with just the index on EMP is certainly less then working with the entire row from both. 

The outer loop then really isn't a nested loop at all, but rather a "lookup" into the EMP table to find the row using the ROWID that it  retrieved from the inner loop. Now it has all the data, the row from DEPT and the row from EMP joined together.

Kinda cool eh?

Monday, November 19, 2012

Which trace file is mine?

A common issue when tracing is finding your trace file after the tracing is done.  Back in the bad old days I would do something like "select 'Ric Van Dyke' from dual;" in the session I was tracing.  Once I was finished I'd then do a search for a trace file with "Ric Van Dyke" in the file to find mine.  It worked, but today we have a much more elegant way to find a trace file:

SQL> alter session set tracefile_identifier='MY_TRACE';

SQL> exec dbms_monitor.session_trace_enable(null,null,true,true,'ALL_EXECUTIONS');

SQL> select 'hello' from dual;

SQL> exit

Disconnected from Oracle Database 11g Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

C:\app\hotsos\diag\rdbms\hotsos\hotsos\trace>dir *MY_TRACE*
11/15/2012  04:30 PM            10,220 hotsos_ora_5540_MY_TRACE.trc
11/15/2012  04:30 PM                94 hotsos_ora_5540_MY_TRACE.trm
               2 File(s)         10,314 bytes

You can make this more elaborate for example this is an anomomys PL/SQL block I have used in the past to turn on tracing, which dynamicly sets TRACEFILE_IDENTIFIER to the current time and user:

  t_trc_file varchar2(256):= 
    'alter session set tracefile_identifier='||chr(39)||
    to_char(sysdate, 'hh24miss’)||'_'||user||chr(39);
  t_trc_stat varchar2(256):=
     'alter session set timed_statistics=true';
  t_trc_size varchar2(256):=
    'alter session set max_dump_file_size=unlimited';
  t_trc_sql  varchar2(256):=
    'alter session set events '||chr(39)||
    '10046 trace name context forever, level 12'||chr(39);
    execute immediate t_trc_file;  -- Set tracefile identifier
    execute immediate t_trc_stat;  -- Turn on timed statistics
    execute immediate t_trc_size;  -- Set max Dump file size
    execute immediate t_trc_sql;   -- Turn on Trace

Friday, November 9, 2012

Is this index being used?

When teaching the Optimizing Oracle SQL, Intensive class I’m often asked how to tell if an index is being used or not.  Oracle does have a feature for this called (oddly enough) index monitoring.  However it’s a less than ideal.  First off it’s just a YES or NO switch.  Either the index has been used since you turned it on or it hasn’t.  Also it’s not exactly 100% accurate.   Sometimes an index will be used and not counted and other times the way it’s used isn’t really what you want, for example collecting stats will typically be counted as “using” the index.  However this is not the type of use most folks are interested in.  We’d like to know when it’s used as part of a query.

We can see this.  With a rather simple query you can easily see which indexes are in use and how many times, and even how the index is being used.  Something like this does the trick; of course you’d want to change the owner name.  

The "#SQL" column is the number of different SQL statements currently in the library cache that have used the index and the "#EXE" column shows the total number of executions.  

select count(*) "#SQL", sum(executions) "#EXE",
object_name, options
from v$sql_plan_statistics_all
where operation = 'INDEX' and
object_owner = 'OP'
group by object_name, options
order by object_name, options

An example run:

SQL> @index_usage

  #SQL   #EXE OBJECT_NAME              OPTIONS
------ ------ ------------------------ ------------------------
     4      4 BIG_OBJTYPE_IDX          RANGE SCAN
     2      4 EMP_EMPNO_PK             FULL SCAN
     1      3 EMP_UPPER_IDX            RANGE SCAN
     1      3 SALESORDER_PK            FAST FULL SCAN
     1      2 SALESORDER_PK            FULL SCAN (MIN/MAX)
     1      3 SALESORDER_PK            UNIQUE SCAN
     2      2 USERNAME_PK              FULL SCAN
     2      2 USERNAME_PK              UNIQUE SCAN

From this simple output we can see that the BIG_OBJTYPE_IDX is used in four different statements that each ran just once.  And that the SALEORDER_PK is used a variety of ways.  Each one from the same statement but some run more times then others.

A couple of other things to keep in mind, this will only show what’s happened “recently”.  Being a V$ view the data isn’t kept in here for any particular length of time.   In a production environment you might want to run this a few times in the day to capture what’s going on.  Also this view is fairly expensive to select from, so doing this once in a while is fine, just don’t set this up to poll the view every second for the day.   It will likely show us as a “top SQL” statement in every monitoring tool.