Thursday, April 28, 2016

Analyzing Execution Plans

Once you get a plan for a SQL statement, then what? How do you know if one might be better than the other?  Is this or that step better to use? 

It's the classic Oracle answer "It Depends!"  No step is always bad nor is any step always good.  Sure some steps are more likely to be "bad" like a Cartesian join is normally a bad thing but on occasion is good.  Also Skip Scans are not always the best choice.  But that’s about it.  Nearly every other type of step you see can be good or bad. 

The classic suggestion is to look for Full Table Scans and try to eliminate them.  However Full Table Scans can be the right thing to do more often than not.  Indexes are great and the right index can sure make the difference in see your results now or days from now.  But if you’re getting data from most of the table blocks, a full scan can be the right way to go. 

So at a slightly higher level what should you do to figure out which plan is better?  There are two basic approaches.  In both these approaches you have to run the SQL and capture stats about the run.  The 10046 Trace is of course one of the best ways to do this.  Also you can use DBMS_XPLAN.DISPLAY_CURSOR to see most of the stats. 

One very important point is that you must do one following, set the parameter STATISTICS_LEVEL = ALL or use the hint GATHER_PLAN_STATISTICS.  If you don't the timing statistic in particular will be incorrect.  Also know that many of the run time stats included direct children.  For example looking at the LIOs a parent includes its children's LIOs, so to know what the parent did you much subtract the direct children.

When comparing two plans that return the same results, you start by comparing LIOs. Generally, the one with the fewest is the best.  A plan that does less LIOs will generally be better than one that does more.  This is because of the latches that are taken out.  Each latch is a possible point of contention.  Fewer LIOs means few latches which in turn means fewer possible points of contention.  Hence even if a plan appears to run fast with more LIOs, it may not scale well in production when hundreds or thousands of users are on the system. 

But there are no absolutes!  So you must take all things into consideration.  If two plans do nearly the same LIOs and the one that does the small amount more LIOs runs significantly faster, then it very well could be the best plan.   Yes I’m being vague on what a small amount more is and significantly faster is as well.  If I put in any numbers here, we’d all find examples where those numbers don’t work.  You’ll have to gauge this in context of your system.

When analyzing a single plan, you find the step(s) that take up the most time.  And your first attempt will be to try to eliminate it.  If you can’t eliminate that step than try make that part of the plan go faster.   Now you focus on just one part of the plan, the plan may have 100 or more steps, yet you only look at 4-6 of them and are able to fix the query.  It’s not necessary to know how the whole plan works most times, just the part that is slow.

Take a look at these two simple query plans.  The query is the same for each, just one hinted to do a nested loop and the other a hash join.  The table BIG_TAB has 2,898,240 rows and SMALL_TAB has 500 rows.  They have the same results, each of these plans were retrieved with, and was formatted a bit to show the relevant columns:

select * from table(dbms_xplan.display_cursor (format=>'iostats last'));

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |  16000 |00:00:00.07 |   18463 |
|   1 |  NESTED LOOPS                |           |      1 |        |  16000 |00:00:00.07 |   18463 |
|   2 |   NESTED LOOPS               |           |      1 |  15769 |  16000 |00:00:00.02 |    2463 |
|   3 |    INDEX FAST FULL SCAN      | SMALL_IDX |      1 |    500 |    500 |00:00:00.01 |     504 |
|*  4 |    INDEX RANGE SCAN          | BIG_IDX   |    500 |     32 |  16000 |00:00:00.01 |    1959 |
|   5 |   TABLE ACCESS BY INDEX ROWID| BIG_TAB   |  16000 |     32 |  16000 |00:00:00.04 |   16000 |
----------------------------------------------------------------------------------------------------


---------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |      1 |        |  16000 |00:00:00.91 |   49818 |
|*  1 |  HASH JOIN            |           |      1 |  15769 |  16000 |00:00:00.91 |   49818 |
|   2 |   INDEX FAST FULL SCAN| SMALL_IDX |      1 |    500 |    500 |00:00:00.01 |       6 |
|   3 |   TABLE ACCESS FULL   | BIG_TAB   |      1 |   2898K|   2898K|00:00:00.61 |   49812 |
---------------------------------------------------------------------------------------------


Can you see that the first one is actually better, not just faster?  The first thing you should compare is the LIOs (the Buffers column).  18,463 to 49,818 then notice that the rows worked with thru the plan are significantly lower in the first as well (the A-Rows column).  The first plan never works with more than 16,000 but the second works with 2,898,000 from BIG_TAB.  And lastly notice that the time is .07 for the first and .91 for the second. 

I encourage you to look at the stats in this order, LIOs, rows then time.  I know most of you will likely start with time, which of course is very important.  But in testing, LIOs and row counts are likely a better indicator of scalability of a SQL statement.


Friday, April 15, 2016

CURSOR_SHARING



Back a thousand years ago (in Oracle time) there weren’t bind variables in SQL.  Everything was a literal.  We had substitution parameters in SQLPlus, but to the optimizer it was still a literal because the substitution happened in SQLPlus before the SQL was sent to the Optimizer.  Along comes V7 and binds come into the mix, and this is very good.  Now we can have one SQL with one cursor in the shared pool (which also was new in V7) and run it with multiple values.  (By the way, substitution parameters still exist, and are cool for scripts, these are the “&” things you might see some times.)

But there was a major problem.  There were likely millions of SQL statements out in the world that should be using binds that weren’t.  Without rewriting everything how could we take advantage of this great new feature?

To combat this problem, Oracle introduced CURSOR_SHARING.  It was (and still is) the lazy programmer’s way to get binds into the SQL.  In the beginning it was the only practical way to take advantage of binds.  But it was always just a “Band-Aid”, not really the best way to run your system.

It had three values: EXACT, FORCE and SIMILAR.  The default is EXACT and that is what you should be using.  This means only shared cursors that match the SQL text “exactly”.  Different bind values are fine, but even the names of the binds have to be exactly the same.   FORCE does what it sounds like it “forces” binds into a SQL statement for every literal it finds.  This will make statements that only differ in the literals used be sharable.  The setting of SIMILAR was supposed to make intelligent choices about binds.  It worked like FORCE, but would come up with different named binds if it thought that there might be different plans.  This option never worked well for the most part. Too often it created lots of different version of the SQL which were not helpful and tending to make things worse not better for sharing of statements.

Starting in 11.2.0.3 SIMILAR is now desupported.  You can still set CURSOR_SHARING to SIMILAR, but it now works the same as FORCE.  This came about because of Adaptive Cursor Sharing which is much better at working with different bind values for the same statement.  Not perfect either but certainly much better and more robust.

Here is a little demo.  The demo just creates a simple table, with no data:
CREATE TABLE rvd99 (x NUMBER, y NUMBER);

Then I have a set of ten queries that look like this:
select count(y) from rvd99 where x = 1;

But each one has a different value for the predicate.  The values are 1, 10, 20, 30, 40, 50, 60, 70, 80 and 99.

I’ll use a select from v$sql_bind_capture and v$sql to see the SQL statements generated for setting CURSOR_SHARING to the values of FORCE and SIMILAR.
Prior to 11.2.0.3 when running this set of queries with FORCE you would see this:

SQL> SELECT s.sql_text, b.value_string, s.plan_hash_value, s.child_number
  2  FROM v$sql_bind_capture b, v$sql s
  3  WHERE s.hash_value = b.hash_value
  4  AND s.address = b.address
  5  AND s.child_number = b.child_number
  6  AND s.sql_text LIKE 'select count(y) from rvd99 %';

SQL_TEXT                                           VALUE PLAN_HASH_VALUE CHILD_NUMBER
-------------------------------------------------- ----- --------------- ------------
select count(y) from rvd99 where x = :"SYS_B_0"    1          4162104232            0

This seems reasonable, just put in the value at run time.  Only one cursor was created for all 10 runs.  But for SIMILAR the output looks like this:

SQL_TEXT                                           VALUE PLAN_HASH_VALUE CHILD_NUMBER
-------------------------------------------------- ----- --------------- ------------
select count(y) from rvd99 where x = :"SYS_B_0"    1          4162104232            0
select count(y) from rvd99 where x = :"SYS_B_0"    1          4162104232            1
select count(y) from rvd99 where x = :"SYS_B_0"    10         4162104232            2
select count(y) from rvd99 where x = :"SYS_B_0"    20         4162104232            3
select count(y) from rvd99 where x = :"SYS_B_0"    30         4162104232            4
select count(y) from rvd99 where x = :"SYS_B_0"    40         4162104232            5
select count(y) from rvd99 where x = :"SYS_B_0"    50         4162104232            6
select count(y) from rvd99 where x = :"SYS_B_0"    60         4162104232            7
select count(y) from rvd99 where x = :"SYS_B_0"    70         4162104232            8
select count(y) from rvd99 where x = :"SYS_B_0"    80         4162104232            9
select count(y) from rvd99 where x = :"SYS_B_0"    99         4162104232           10

Notice how similar ended up creating a new child for each value, which is pretty much what we were hoping to avoid with CURSOR_SHARING.  After 11.2.0.3 you get the same thing for FORCE and SIMILAR which is just the one plan:

SQL_TEXT                                           VALUE PLAN_HASH_VALUE CHILD_NUMBER
-------------------------------------------------- ----- --------------- ------------
select count(y) from rvd99 where x = :"SYS_B_0"    1          4162104232            0

So what does all this mean?  Two things that I can think of, use EXACT unless you have code you can’t change, then use FORCE as a temporary fix until you can get binds in the code where they should be.  And two, let Adaptive Cursor Sharing do what SIMILAR was supposed to do and never did very well, that is have different plans for different sets of binds.