Thursday, April 28, 2016

Analyzing Execution Plans

Once you get a plan for a SQL statement, then what? How do you know if one might be better than the other?  Is this or that step better to use? 

It's the classic Oracle answer "It Depends!"  No step is always bad nor is any step always good.  Sure some steps are more likely to be "bad" like a Cartesian join is normally a bad thing but on occasion is good.  Also Skip Scans are not always the best choice.  But that’s about it.  Nearly every other type of step you see can be good or bad. 

The classic suggestion is to look for Full Table Scans and try to eliminate them.  However Full Table Scans can be the right thing to do more often than not.  Indexes are great and the right index can sure make the difference in see your results now or days from now.  But if you’re getting data from most of the table blocks, a full scan can be the right way to go. 

So at a slightly higher level what should you do to figure out which plan is better?  There are two basic approaches.  In both these approaches you have to run the SQL and capture stats about the run.  The 10046 Trace is of course one of the best ways to do this.  Also you can use DBMS_XPLAN.DISPLAY_CURSOR to see most of the stats. 

One very important point is that you must do one following, set the parameter STATISTICS_LEVEL = ALL or use the hint GATHER_PLAN_STATISTICS.  If you don't the timing statistic in particular will be incorrect.  Also know that many of the run time stats included direct children.  For example looking at the LIOs a parent includes its children's LIOs, so to know what the parent did you much subtract the direct children.

When comparing two plans that return the same results, you start by comparing LIOs. Generally, the one with the fewest is the best.  A plan that does less LIOs will generally be better than one that does more.  This is because of the latches that are taken out.  Each latch is a possible point of contention.  Fewer LIOs means few latches which in turn means fewer possible points of contention.  Hence even if a plan appears to run fast with more LIOs, it may not scale well in production when hundreds or thousands of users are on the system. 

But there are no absolutes!  So you must take all things into consideration.  If two plans do nearly the same LIOs and the one that does the small amount more LIOs runs significantly faster, then it very well could be the best plan.   Yes I’m being vague on what a small amount more is and significantly faster is as well.  If I put in any numbers here, we’d all find examples where those numbers don’t work.  You’ll have to gauge this in context of your system.

When analyzing a single plan, you find the step(s) that take up the most time.  And your first attempt will be to try to eliminate it.  If you can’t eliminate that step than try make that part of the plan go faster.   Now you focus on just one part of the plan, the plan may have 100 or more steps, yet you only look at 4-6 of them and are able to fix the query.  It’s not necessary to know how the whole plan works most times, just the part that is slow.

Take a look at these two simple query plans.  The query is the same for each, just one hinted to do a nested loop and the other a hash join.  The table BIG_TAB has 2,898,240 rows and SMALL_TAB has 500 rows.  They have the same results, each of these plans were retrieved with, and was formatted a bit to show the relevant columns:

select * from table(dbms_xplan.display_cursor (format=>'iostats last'));

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |  16000 |00:00:00.07 |   18463 |
|   1 |  NESTED LOOPS                |           |      1 |        |  16000 |00:00:00.07 |   18463 |
|   2 |   NESTED LOOPS               |           |      1 |  15769 |  16000 |00:00:00.02 |    2463 |
|   3 |    INDEX FAST FULL SCAN      | SMALL_IDX |      1 |    500 |    500 |00:00:00.01 |     504 |
|*  4 |    INDEX RANGE SCAN          | BIG_IDX   |    500 |     32 |  16000 |00:00:00.01 |    1959 |
|   5 |   TABLE ACCESS BY INDEX ROWID| BIG_TAB   |  16000 |     32 |  16000 |00:00:00.04 |   16000 |
----------------------------------------------------------------------------------------------------


---------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |      1 |        |  16000 |00:00:00.91 |   49818 |
|*  1 |  HASH JOIN            |           |      1 |  15769 |  16000 |00:00:00.91 |   49818 |
|   2 |   INDEX FAST FULL SCAN| SMALL_IDX |      1 |    500 |    500 |00:00:00.01 |       6 |
|   3 |   TABLE ACCESS FULL   | BIG_TAB   |      1 |   2898K|   2898K|00:00:00.61 |   49812 |
---------------------------------------------------------------------------------------------


Can you see that the first one is actually better, not just faster?  The first thing you should compare is the LIOs (the Buffers column).  18,463 to 49,818 then notice that the rows worked with thru the plan are significantly lower in the first as well (the A-Rows column).  The first plan never works with more than 16,000 but the second works with 2,898,000 from BIG_TAB.  And lastly notice that the time is .07 for the first and .91 for the second. 

I encourage you to look at the stats in this order, LIOs, rows then time.  I know most of you will likely start with time, which of course is very important.  But in testing, LIOs and row counts are likely a better indicator of scalability of a SQL statement.


No comments:

Post a Comment