Once you get
a plan for a SQL statement, then what? How do you know if one might be better than
the other? Is this or that step better to use?
It's the
classic Oracle answer "It Depends!" No step is always bad nor
is any step always good. Sure some steps are more likely to be
"bad" like a Cartesian join is normally a bad thing but on occasion
is good. Also Skip Scans are not always
the best choice. But that’s about it. Nearly every other type of step you see can be
good or bad.
The classic
suggestion is to look for Full Table Scans and try to eliminate them. However Full Table Scans can be the right
thing to do more often than not. Indexes
are great and the right index can sure make the difference in see your results
now or days from now. But if you’re
getting data from most of the table blocks, a full scan can be the right way to
go.
So at a
slightly higher level what should you do to figure out which plan is
better? There are two basic approaches. In both these approaches you have to run the
SQL and capture stats about the run. The
10046 Trace is of course one of the best ways to do this. Also you can use DBMS_XPLAN.DISPLAY_CURSOR to see most of the stats.
One very
important point is that you must do one following, set the parameter STATISTICS_LEVEL = ALL or use the hint GATHER_PLAN_STATISTICS.
If you don't the timing statistic in particular will be incorrect. Also know that many of the run time stats
included direct children. For example
looking at the LIOs a parent includes its children's LIOs, so to know what the
parent did you much subtract the direct children.
When
comparing two plans that return the same results, you start by comparing LIOs. Generally,
the one with the fewest is the best. A
plan that does less LIOs will generally be better than one that does more. This is because of the latches that are taken
out. Each latch is a possible point of
contention. Fewer LIOs means few latches
which in turn means fewer possible points of contention. Hence even if a plan appears to run fast with
more LIOs, it may not scale well in production when hundreds or thousands of
users are on the system.
But there
are no absolutes! So you must take all
things into consideration. If two plans
do nearly the same LIOs and the one that does the small amount more LIOs runs
significantly faster, then it very well could be the best plan. Yes I’m
being vague on what a small amount more is and significantly faster is as
well. If I put in any numbers here, we’d
all find examples where those numbers don’t work. You’ll have to gauge this in context of your
system.
When
analyzing a single plan, you find the step(s) that take up the most time. And your first attempt will be to try to
eliminate it. If you can’t eliminate that
step than try make that part of the plan go faster. Now
you focus on just one part of the plan, the plan may have 100 or more steps,
yet you only look at 4-6 of them and are able to fix the query. It’s not necessary to know how the whole plan
works most times, just the part that is slow.
Take a look
at these two simple query plans. The
query is the same for each, just one hinted to do a nested loop and the other a
hash join. The table BIG_TAB has 2,898,240 rows and SMALL_TAB has 500 rows. They have the same results, each of these
plans were retrieved with, and was formatted a bit to show the relevant columns:
select *
from table(dbms_xplan.display_cursor (format=>'iostats last'));
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time
| Buffers |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 16000 |00:00:00.07 | 18463 |
| 1 |
NESTED LOOPS | |
1 | |
16000 |00:00:00.07 | 18463 |
| 2 |
NESTED LOOPS | | 1 |
15769 | 16000 |00:00:00.02 | 2463 |
| 3 |
INDEX FAST FULL SCAN |
SMALL_IDX | 1 | 500 |
500 |00:00:00.01 | 504 |
|* 4 |
INDEX RANGE SCAN |
BIG_IDX | 500 |
32 | 16000 |00:00:00.01 | 1959 |
| 5 |
TABLE ACCESS BY INDEX ROWID| BIG_TAB
| 16000 | 32 |
16000 |00:00:00.04 | 16000 |
----------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time
| Buffers |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 16000 |00:00:00.91 | 49818 |
|* 1 |
HASH JOIN | | 1 |
15769 | 16000 |00:00:00.91 | 49818
|
| 2 |
INDEX FAST FULL SCAN| SMALL_IDX |
1 | 500 | 500 |00:00:00.01 | 6 |
| 3 |
TABLE ACCESS FULL | BIG_TAB |
1 | 2898K| 2898K|00:00:00.61 | 49812 |
---------------------------------------------------------------------------------------------
Can you see
that the first one is actually better, not just faster? The first thing you should compare is the
LIOs (the Buffers column). 18,463 to 49,818
then notice that the rows worked with thru the plan are significantly lower in
the first as well (the A-Rows column).
The first plan never works with more than 16,000 but the second works
with 2,898,000 from BIG_TAB.
And lastly notice that the time is .07 for the first and .91 for the
second.
I encourage you
to look at the stats in this order, LIOs, rows then time. I know most of you will likely start with
time, which of course is very important.
But in testing, LIOs and row counts are likely a better indicator of scalability
of a SQL statement.