Tuesday, April 1, 2014

Hash THIS! A look at HASH_VALUE and PLAN_HASH_VALUE.



Way back in Oracle time, before SQL_ID there was just HASH_VALUE to identify statements in the shared pool.  Oracle used a technique that basically takes some number of charters of the statement and using a magical hashing algorithm computes a hexadecimal number to represent the statement in the shared pool.  What the optimizer would do at parse time is use this value to see if the statement was already in the shared pool.  But, since no hash algorithm is perfect, there would be collisions where two different statements would have the same hash value.  So the optimizer would then compare character by character to verify they were the same. 

This worked fine but as databases got bigger and SQL statements got longer, the collision rate went up and the optimizer was spending too much time looking for queries, so SQL_ID came along in version 10. This really is much like HASH_VALUE, but uses more characters of the statement and is also in base 32 numbering.  The SQL_ID looks a lot more like a string then a number, but it’s a number. 

Then in 10.2 we started to see PLAN_HASH_VALUE.  What is this?  

One of the goals of the folk writing the optimizer is to make it smarter and faster.  This cascades into the run time of the SQL.  If we can either parse or find the already parsed plan faster than we can run faster.  Also why have multiple plans that really are the same BUT have slightly different statements?  Wouldn't it be nice if two (or more) statements that “look different” but generate the same plan, just have ONE plan to use?   

Yes it would be, and that is where PLAN_HASH_VALUE comes in.  

Take a look at this very simple example:

SQL> @empno_test
SQL> alter system flush shared_pool;
System altered.
SQL> set echo on feed on
SQL> exec dbms_application_info.set_module('LAB1','EMPNO1');

PL/SQL procedure successfully completed.

SQL> select ename, job from emp where empno = 7902;
ENAME      JOB
---------- ---------
FORD       ANALYST

1 row selected.

SQL> exec dbms_application_info.set_module('LAB1','EMPNO2');

PL/SQL procedure successfully completed.

SQL> select ename, job from emp where 7902 = empno;
ENAME      JOB
---------- ---------
FORD       ANALYST

1 row selected.

SQL>
SQL> exec dbms_application_info.set_module(null,null);

PL/SQL procedure successfully completed.

SQL> set lines 10000
SQL> column action format a10
SQL> column executions heading EXEC format 9999
SQL> column version_count heading VERSIONS format 9999
SQL> column hash_value format 99999999999
SQL> column sql_text format a50
SQL>
SQL> select action, executions, version_count, hash_value, sql_id, plan_hash_value, sql_text
  2    from v$sqlarea
  3   where module = 'LAB1' and command_type != 47
  4   order by hash_value;
ACTION      EXEC VERSIONS   HASH_VALUE SQL_ID        PLAN_HASH_VALUE SQL_TEXT
---------- ----- -------- ------------ ------------- --------------- --------------------------------------------------
EMPNO2         1        1     77702346 ay1h1g42a396a      4120447789 select ename, job from emp where 7902 = empno
EMPNO1         1        1    116025303 1cvmsb83fntyr      4120447789 select ename, job from emp where empno = 7902

2 rows selected.

Notice each statement has its own unique HASH_VALUE and SQL_ID.  And both have the same PLAN_HASH_VALUE.   This means that although they are truly two different statements in the view of the optimizer, but they share the same plan.  And that plan is:

SQL> select /*+ gather_plan_statistics*/ ename, job from emp where 7902 = empno;
ENAME      JOB
---------- ---------
FORD       ANALYST

1 row selected.

SQL> select * from table(dbms_xplan.display_cursor());
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
SQL_ID  8nuj6ncwusvdt, child number 0
-------------------------------------
select /*+ gather_plan_statistics*/ ename, job from emp where 7902 =
empno

Plan hash value: 4120447789

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |     1 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP          |     1 |    18 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | EMP_EMPNO_PK |     1 |       |     0   (0)|          |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPNO"=7902)

20 rows selected.

A key point is that the two statements are DIFFERENT.  They will each take up some space of their own in the shared pool in the SGA, but the good news is they do share the same plan. 

The HASH_VALUE and the PLAN_HASH_VALUE are different numbers used differently.  The explain plan (see above) shows the PLAN_HASH_VALUE not the HASH_VALUE, be careful not to confuse the two.

Pretty cool, eh? 

Sunday, March 30, 2014

To BIND or not to BIND.... CURSOR_SHARING

CURSOR_SHARING is a parameter that controls what does the Oracle Optimizer do with literal values in a SQL statement. The default setting is EXACT which I personal think is the best.  This doesn't change literals and its up to the coder to put in either literals or binds as needed in the code.  A common setting is FORCE which forces the replacement of binds with literals.

In this simple example you can see that when CURSOR_SHARING is EXACT we have 3 distinct statements for the 3 runs.  However when set to FORCE the optimizer substitutes in a BIND variable for the literal value and we have just one statement.  This is good right?  Well sort of, but I'll talk about how this can be problematic in another blog entry soon.

Here is the simple example (FYI - I removed the actual output from the 3 queries for readability):

SQL> set lines 200
SQL> column SQL_TEXT format a85
SQL> column EXE format 999
SQL> column PAR format 999
SQL> column LDS format 999
SQL> column OWNER format a40
SQL>
SQL> set feedback on
SQL> set echo on
SQL> set termout on
SQL>
SQL> alter system flush shared_pool;

System altered.

SQL> alter session set cursor_sharing = exact;

Session altered.

SQL>
SQL> show parameter cursor_sharing;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cursor_sharing                       string      EXACT
SQL>
SQL> select /* BINDS_TEST */ distinct owner from big_tab where object_type='TABLE';


SQL> select /* BINDS_TEST */ distinct owner from big_tab where object_type='SYNONYM';


SQL> select /* BINDS_TEST */ distinct owner from big_tab where object_type='INDEX';

SQL>
SQL> select SQL_ID,EXECUTIONS EXE,LOADS LDS,PARSE_CALLS PAR,SQL_TEXT from v$sqlarea where sql_text like 'select /* BINDS_TEST */ distinct %';

SQL_ID         EXE  LDS  PAR SQL_TEXT
------------- ---- ---- ---- -------------------------------------------------------------------------------------
ajarrkkxts4hk    1    1    1 select /* BINDS_TEST */ distinct owner from big_tab where object_type='SYNONYM'
75zrs6yswsdka    1    1    1 select /* BINDS_TEST */ distinct owner from big_tab where object_type='INDEX'
23g9hv9w9tfmk    1    1    1 select /* BINDS_TEST */ distinct owner from big_tab where object_type='TABLE'

3 rows selected.

SQL>
SQL> alter system flush shared_pool;

System altered.

*********************************************************************
*****             Changing CURSOR_SHARING to FORCE              *****
*********************************************************************
SQL> alter session set cursor_sharing = force;

Session altered.

SQL>
SQL> show parameter cursor_sharing;

NAME                                 TYPE                             VALUE
------------------------------------ -------------------------------- ------------------------------
cursor_sharing                       string                           FORCE
SQL>
SQL> select /* BINDS_TEST */ distinct owner from big_tab where object_type='TABLE';

SQL> select /* BINDS_TEST */ distinct owner from big_tab where object_type='SYNONYM';

SQL> select /* BINDS_TEST */ distinct owner from big_tab where object_type='INDEX';

SQL>
SQL> select SQL_ID,EXECUTIONS EXE,LOADS LDS,PARSE_CALLS PAR,SQL_TEXT from v$sqlarea where sql_text like 'select /* BINDS_TEST */ distinct %';

SQL_ID         EXE  LDS  PAR SQL_TEXT
------------- ---- ---- ---- -------------------------------------------------------------------------------------
bj0a2w1puthua    3    2    3 select /* BINDS_TEST */ distinct owner from big_tab where object_type=:"SYS_B_0"

1 row selected.


SQL>
SQL>
SQL>

Thursday, March 20, 2014

Ace it is!

I've been recognized as an Oracle Ace!  


I'm one of the first 10 folks approved for the new Ace Associate level.  I'm pleased to be recognized for my many years working with Oracle and helping other use it more effectively thru presentations, email lists, user group meetings, casual conversation, twitter and this blog of course.

A big thanks to Paul Jackson and Mark Bobak for submitting my application. 

Monday, February 24, 2014

PL/SQL and formating of SQL text

What does the PL/SQL engine do with the SQL in a block? 

It does some pretty cool things actually.  It coverts the entire statement to upper case, except for quoted strings of course, and it strips out all white space down to a single space.  So carriage returns, line feeds, tabs and such all become a single space.  Also comments are removed, but not HINTS.


Here is s simple example.  I'm using MODULE and ACTION name here to identify these different runs.  I'll run what is really the same query 4 times but each time the text is a little different and the last one has a hint:

SQL> exec dbms_application_info.set_module('PLSQL','LOWER');

PL/SQL procedure successfully completed.

SQL>
SQL> declare
  2   cnt number;
  3  begin
  4    select count(*) into cnt from emp e;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> exec dbms_application_info.set_module('PLSQL','UPPER');

PL/SQL procedure successfully completed.

SQL>
SQL> DECLARE
  2   CNT NUMBER;
  3  BEGIN
  4    SELECT COUNT(*) INTO CNT FROM EMP E;
  5  END;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> exec dbms_application_info.set_module('PLSQL','TWOLINES');

PL/SQL procedure successfully completed.

SQL>
SQL> declare
  2   cnt number;
  3  begin
  4    select count(*) into cnt
  5    from emp e;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL>
SQL> exec dbms_application_info.set_module('PLSQL','HINT');

PL/SQL procedure successfully completed.

SQL>
SQL> declare
  2   cnt number;
  3  begin
  4    select /*+ full (e) */ count(*) into cnt
  5    from emp e;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> exec dbms_application_info.set_module(null,null);

PL/SQL procedure successfully completed.

SQL>
SQL> select action, executions, version_count, hash_value, sql_text
  2    from v$sqlarea
  3   where module = 'PLSQL' and command_type != 47
  4   order by hash_value;

ACTION      EXEC VERSIONS   HASH_VALUE SQL_TEXT
---------- ----- -------- ------------ --------------------------------------------------
HINT           1        1   2245097906 SELECT /*+ full (e) */ COUNT(*) FROM EMP E
LOWER          3        1   2728343546 SELECT COUNT(*) FROM EMP E

2 rows selected.


Notice that for for 3 of them they all ended up being the same SQL statement.  The hinted statement is different and notice that the hint text is not converted to upper case.  This is pretty cool. 

What this means is that you might want to put even simple code in a PL/SQL block since this will promote the reuse of cursors that only have slight differences in text but are really the same statement. 

The PL/SQL engine has been doing this "forever". 



Friday, February 21, 2014

Check your SQL and SQL*Plus skills #1

What will be the output of this in SQL*Plus as written here (predict it without running it, then run it and see if you're right):


SELECT 1 -
SAL FROM EMP;


How did you do?

Thursday, February 13, 2014

Does the LIMIT clause on a BULK COLLECT select really matter?

Mostly I teach SQL optimization because SQL has the most impact on performance, with performance being from TIME perspective.  But there is more to performance then time.  Scalability is really at the heart of performance.  And your SQL might not scale well for a other things not directly related to time.  Like memory usage.  If your SQL consumes a lot of memory, then it may run fast but could scale horribly and hence not run well under pressure.

In PL/SQL land there is a great feature to read in a set of rows from a table (or tables) called BULK COLLECT.  It's a great way to read in a set of rows at a time, then process then rather than trying to get one row at a time.  And there is a LIMIT clause that can be used to limit the number of rows coming back.  If working with a few rows (less than 100 or so) then you can likely not use this and be fine, but if you're working with a larger set of rows you really should set this to something realistic.

Here is a simple example to show what happens with the setting of LIMIT.

First, I have a routine to get PGA space used:

CREATE OR REPLACE FUNCTION pga_memory_used
RETURN NUMBER
AS
    l_used_memory  NUMBER;
BEGIN
    SELECT ms.value
      INTO l_used_memory
      FROM v$mystat ms,
           v$statname sn
     WHERE ms.statistic# = sn.statistic#
       AND sn.name = 'session pga memory';
    RETURN l_used_memory;
END;
/


Now I'll run this little code block to illustrate the effects of the LIMIT clause. I'll run it twice once with limit set to 5000 and then again will it set to 1000.  The table BIG_TAB has about 2,868,512 rows and the average row length is 115.  (NOTE: The call to bigtab_process after the fetch call is to a stub procedure in this case, it does nothing.)

DECLARE
   l_start_mem  number;
   l_start_time pls_integer;
   l_start_cpu  pls_integer;
   CURSOR bigtab_cur IS
   SELECT * FROM big_tab;
   TYPE big_tab_t IS TABLE OF big_tab%ROWTYPE;
   bigtab big_tab_t;
BEGIN
   l_start_mem  := pga_memory_used;
   l_start_time := DBMS_UTILITY.GET_TIME;
   l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
   OPEN bigtab_cur;
   LOOP
     FETCH bigtab_cur
     BULK COLLECT INTO bigtab LIMIT 5000;    
     EXIT WHEN bigtab.COUNT = 0;
     bigtab_process;
   END LOOP;
   DBMS_OUTPUT.put_line ('----------------------------');
   DBMS_OUTPUT.put_line ('--- BULK COLLECTION DEMO ---');
   DBMS_OUTPUT.put_line ('The limit clause set to 5000');
   DBMS_OUTPUT.put_line ('Times in hundredths of a second');
   DBMS_OUTPUT.put_line ('**** TIME   - '||to_char(DBMS_UTILITY.get_time - l_start_time));
   DBMS_OUTPUT.put_line ('**** CPU    - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
   DBMS_OUTPUT.put_line ('**** Memory '||to_char(((pga_memory_used - l_start_mem))/1024,'9,999,999')||' KB');
END;
/


And now the run results:


----------------------------
--- BULK COLLECTION DEMO ---
The limit clause set to 5000
Times in hundredths of a second
**** TIME   - 548
**** CPU    - 520
**** Memory      7,552 KB

----------------------------
--- BULK COLLECTION DEMO ---
The limit clause set to 1000
Times in hundredths of a second
**** TIME - 537
**** CPU  - 487
**** Memory      2,112 KB



Notice that the time (wall clock time) doesn't really change much, the CPU is less in the second one but look at the memory usage.  A rather significant drop!  And yes this is consistent over multiple runs, the actually numbers vary but the second one always uses something like half or less memory and less CPU. 

You might say, hey 7M vs 2M what's the big deal?  I have gigabytes of PGA space!  Sure on a one run bases this isn't a big deal. But multiple this by a few 1000 users.  Now it might make a difference.

If you're using BULK COLLECT (which you should be!) then check the LIMIT clause, could save some memory and could scale your application much better.

Monday, January 6, 2014

Adaptive Query Optimization – Will the real plan please stand up!



A new and powerful feature of 12c is Adaptive Query Optimization.  Now the optimizer really is taking a “hands on” approach to optimization.  In the past the optimizer came up with a plan and “tossed it over the fence” to the run time engine and dusted its hands of that plan and move on to the next.  (Not exactly true, even in 11 there was some interaction but nothing like what we see in 12.) 

In 12 there is a much more active role by the optimizer as to what is going to happen and does happen to the plan.  There are two main parts to this technique Adaptive Plans and Adaptive Statistics.  I’m going to show with a pretty simple query we use in our classes to show a little bit of each.

The query I used is this:

select /*+ gather_plan_statistics*/ b.object_name, b.object_type, a.username
  from allusers_tab a, big_tab b
  where a.username = b.owner 
  and b.object_type = 'PROCEDURE'
   and a.username not in ('SYS','SYSTEM');

BIG_TAB is a copy of ALL_OBJECTS and has about 2.9 million rows.  ALLUSERS_TAB is a copy of (you guessed it) ALL_USERS, and has 43 rows in it.  There are current stats on each table, no histograms. There is an index on BIG_TAB.OBJECT_TYPE and ALLUSERS_TAB.USERNAME. 

 Now I do an explain plan on tha query then run it twice.  I’m using DBMS_XPLAN to show the plan.  The first plan is with DBMS_XPLAN.DISPLAY() then for the two runs I use DISPLAY_CURSOR(FORMAT=>'+ADAPTIVE').  The new format switch of ‘+ADAPTIVE’ shows adaptive information about the cursor if there is any.  

Here is the plan from the explain plan, using the following select:

select * from table(dbms_xplan.display());
 
--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                 | 71204 |  3476K|  3367   (1)| 00:00:01 |
|*  1 |  HASH JOIN                           |                 | 71204 |  3476K|  3367   (1)| 00:00:01 |
|*  2 |   INDEX FULL SCAN                    | USERNAME_PK     |    35 |   350 |     1   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID BATCHED| BIG_TAB         | 73182 |  2858K|  3365   (1)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN                  | BIG_OBJTYPE_IDX | 73182 |       |   205   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A"."USERNAME"="B"."OWNER")
   2 - filter("A"."USERNAME"<>'SYS' AND "A"."USERNAME"<>'SYSTEM')
   3 - filter("B"."OWNER"<>'SYS' AND "B"."OWNER"<>'SYSTEM')
   4 - access("B"."OBJECT_TYPE"='PROCEDURE')

Note
-----
   - this is an adaptive plan

The explain plan shows the DEFAULT plan, it does tell us that it’s an adaptive plan.  But we don’t know what the other steps are at this point.  Interestingly if you do a 10053 trace you do see all the steps in the plan but it’s less then clear in the 10053 output what is the default plan and what is the alternate plan. 

Now I run the query once and this is the plan I see after it runs, using the following select:

 select * from table(dbms_xplan.display_cursor(format=>'+adaptive'));

----------------------------------------------------------------------------------------------------------
|   Id  | Operation                            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                     |                 |       |       |  3367 (100)|          |
|  *  1 |  HASH JOIN                           |                 | 71204 |  3476K|  3367   (1)| 00:00:01 |
|-    2 |   NESTED LOOPS                       |                 |       |       |            |          |
|-    3 |    NESTED LOOPS                      |                 | 71204 |  3476K|  3367   (1)| 00:00:01 |
|-    4 |     STATISTICS COLLECTOR             |                 |       |       |            |          |
|  *  5 |      INDEX FULL SCAN                 | USERNAME_PK     |    35 |   350 |     1   (0)| 00:00:01 |
|- *  6 |     INDEX RANGE SCAN                 | BIG_OBJTYPE_IDX | 73182 |       |   205   (0)| 00:00:01 |
|- *  7 |    TABLE ACCESS BY INDEX ROWID       | BIG_TAB         |  2033 | 81320 |  3365   (1)| 00:00:01 |
|  *  8 |   TABLE ACCESS BY INDEX ROWID BATCHED| BIG_TAB         | 73182 |  2858K|  3365   (1)| 00:00:01 |
|  *  9 |    INDEX RANGE SCAN                  | BIG_OBJTYPE_IDX | 73182 |       |   205   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A"."USERNAME"="B"."OWNER")
   5 - filter(("A"."USERNAME"<>'SYS' AND "A"."USERNAME"<>'SYSTEM'))
   6 - access("B"."OBJECT_TYPE"='PROCEDURE')
   7 - filter(("A"."USERNAME"="B"."OWNER" AND "B"."OWNER"<>'SYS' AND "B"."OWNER"<>'SYSTEM'))
   8 - filter(("B"."OWNER"<>'SYS' AND "B"."OWNER"<>'SYSTEM'))
   9 - access("B"."OBJECT_TYPE"='PROCEDURE')

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

Now we see the “whole enchilada”.  We can see that there is a set of alternate steps created to do the join that is a double nested loop, but that part of the plan is inactive for this run. It used the default plan to get the results.  Also notice step #4 “STATISTICS COLLECTOR”, this is an obvious point where the optimizer is taking a more active role in how the plan really runs.  Unlike in the past where it just came up with a plan and said to the run time engine “do this and don’t tell me about it”, it’s now in effect saying “hey while this runs, let me know what happened so I can adjust if needed.”  

Now let’s see what happens when we run it one more time.

--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                 |       |       |   480 (100)|          |
|   1 |  NESTED LOOPS                        |                 |  1664 | 83200 |   480   (1)| 00:00:01 |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| BIG_TAB         |  1664 | 66560 |   479   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | BIG_OBJTYPE_IDX |  6336 |       |   205   (0)| 00:00:01 |
|*  4 |   INDEX UNIQUE SCAN                  | USERNAME_PK     |     1 |    10 |     0   (0)|          |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("B"."OWNER"<>'SYS' AND "B"."OWNER"<>'SYSTEM'))
   3 - access("B"."OBJECT_TYPE"='PROCEDURE')
   4 - access("A"."USERNAME"="B"."OWNER")
       filter(("A"."USERNAME"<>'SYS' AND "A"."USERNAME"<>'SYSTEM'))

Note
-----
   - statistics feedback used for this statement

Here we can see that the final plan is really totally different from the two the optimizer came up with during its initial parse!   The plan it goes with after a run and getting the statistics of the run is a good old single nested loop, but with the big table on the outside.  (This isn’t new, the optimizers been doing this for a while, I think 10.2 or so.) 

Of interest on the last run also is that this is no longer an adaptive plan, it is done.  This is it.  It does tell us that this used statistics feedback to come up with this plan which is pretty cool. 

What can we take away from this?  One thing for sure is that an explain plan is even less likely to be the plan that is really used.  And that we should run the query at least twice before we can say with some sort of certainty that we now have the plan that will really is used.  

BTW -  A little note here if you're doing a test like this on your own.  If you're using SQLPlus (as all really DBAs do) then make sure you turn off SERVEROUTPUT, if you don't then the plans you'll see when using the select * from table(dbms_xplan.display_cursor(format=>'+adaptive')) type queries will be for the command used by SERVEROUTPUT and not your query.