Tuesday, October 21, 2014

2014 East Coast Oracle Users Conference



Hey everyone!  I'll be speaking at this conference with a host of other great speakers.  Come on down and join the fun!



2014 East Coast Oracle Users Conference (ECO) 
November 4 &5 
Sheraton Imperial Hotel & Convention Center 
 Raleigh/Durham, NC

Friday, October 3, 2014

CAGE MATCH: IN vs OR


Many of us know that an IN list is turned into a set of OR’ed predicates by the optimizer.  Up until 11 an IN list and an OR list were pretty much the same.  But in 11 there was a subtle shift that changed the game.

This example is from the Hotsos “Optimizing Oracle SQL, Intensive” course.  When originally written this was to demonstrate how ANDs and ORs work with regard to “short circuit logic”.    That is, when a FALSE is found in an AND list it stops and when a TRUE is found in an OR list is stops.  But something odd started to happen in 11:

First we have a little function.  It writes a line to the screen and returns the word “match”.  The line it writes tells us that it ran for a particular row. The data we use for this is equally complex, a table with 2 columns and 2 rows; a string and a number, the number to represent each row.


create or replace function test_po(a number) return varchar2
is
begin
 dbms_output.put_line('function executed for row '||a);
 return 'match';
end;
/
SQL> desc pred_order
 Name          Null?    Type
 ------------- -------- ------------
 COL1                   VARCHAR2(10)
 COL2                   NUMBER
SQL> select * from pred_order;
 COL1             COL2
 ---------- ----------
 aaaaa               1
 match               2

Here’s a query that selects from this table:

select *
from pred_order
where test_po(col2) in ('xxxxx','yyyyy')
and col1 = 'zzzzz'

Now when we execute this plan in 10 or before (and using IO costing), the output would look like below.  Notice that the function was executed twice for each row, and you can see that in the plan the function is shown twice in the OR list that the IN list was converted to.  So that makes sense:

SQL> alter session set "_optimizer_cost_model"=io;
SQL>
SQL> @pred1
function executed for row 1
function executed for row 1
function executed for row 2
function executed for row 2
SQL>
SQL> @hxplan
Enter .sql file name (without extension): pred1
Enter the display level (TYPICAL, ALL, BASIC, SERIAL) [TYPICAL]  :
Plan hash value: 1987733821

----------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost  |
----------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     1 |     9 |     2 |
|*  1 |  TABLE ACCESS FULL| PRED_ORDER |     1 |     9 |     2 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("TEST_PO"("COL2")='xxxxx' OR "TEST_PO"("COL2")='yyyyy')
              AND "COL1"='zzzzz')

Note
-----
   - cpu costing is off (consider enabling it)
SQL>

However do the same thing in 11 or 12 and the output different.  The plan stays the same but the function appears to be only run once:

SQL> @pred1
function executed for row 1
function executed for row 2
SQL>
SQL> @hxplan
Enter .sql file name (without extension): pred1
Enter the display level (TYPICAL, ALL, BASIC, SERIAL) [TYPICAL]  :
Plan hash value: 1987733821

----------------------------------------------------------------
| Id  | Operation      | Name       | Rows  | Bytes | Cost  |
----------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     1 |     9 |     2 |
|*  1 |  TABLE ACCESS FULL| PRED_ORDER |     1 |     9 |     2 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("TEST_PO"("COL2")='xxxxx' OR "TEST_PO"("COL2")='yyyyy')
           AND "COL1"='zzzzz')

Note
-----
   - cpu costing is off (consider enabling it)
SQL>

What I’m pretty sure is happening is some sort of “result cache” is being done.  It sees that it’s the same call with the same value being passed in so it either automatically created a result cache or behind the scenes is putting in a bind like value for the second call.  It does make the comparison, I put the word “match” way down the list and is did find it, so it’s does really do the comparison it just doesn’t make the function call over and over again.

That’s cool.  But watch what happens if I manually change the query to ORs instead of the IN list:

SQL> get pred1_new
  1  select *
  2    from pred_order
  3   where (test_po(col2)='xxxxx' or test_po(col2)='yyyyy')
  4*    and col1 = 'zzzzz'
SQL>
SQL>
SQL> @pred1_new
function executed for row 1
function executed for row 1
function executed for row 2
function executed for row 2
SQL>
SQL> @hxplan
Enter .sql file name (without extension): pred1_new
Enter the display level (TYPICAL, ALL, BASIC, SERIAL) [TYPICAL]  :
Plan hash value: 1987733821

----------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost  |
----------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     1 |     9 |     2 |
|*  1 |  TABLE ACCESS FULL| PRED_ORDER |     1 |     9 |     2 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("TEST_PO"("COL2")='xxxxx' OR "TEST_PO"("COL2")='yyyyy')
              AND "COL1"='zzzzz')

Note
-----
   - cpu costing is off (consider enabling it)
SQL>

We are right back to it running the function twice!  HA!  So with the IN list it recognizes that it’s the same call over and over, but when I manually separate it out, the optimizer can’t see that is the same call over and over.

*** BTW – If you’re use CPU costing (which you should be), the optimizer moves the AND predicate (COL1= ‘zzzzz’) first and you see nothing.  This is another part of the exercise to see how predicates move around with CPU costing.

So the bottom line?  INs do have an edge over ORs when it comes to a function call in particular.   There is another advantage which this demo doesn’t show which is the IN list will sort the list, this is very useful for index scans with the IN-List Iterator step, maybe another post about that someday.