Tuesday, April 1, 2014

Hash THIS! A look at HASH_VALUE and PLAN_HASH_VALUE.



Way back in Oracle time, before SQL_ID there was just HASH_VALUE to identify statements in the shared pool.  Oracle used a technique that basically takes some number of charters of the statement and using a magical hashing algorithm computes a hexadecimal number to represent the statement in the shared pool.  What the optimizer would do at parse time is use this value to see if the statement was already in the shared pool.  But, since no hash algorithm is perfect, there would be collisions where two different statements would have the same hash value.  So the optimizer would then compare character by character to verify they were the same. 

This worked fine but as databases got bigger and SQL statements got longer, the collision rate went up and the optimizer was spending too much time looking for queries, so SQL_ID came along in version 10. This really is much like HASH_VALUE, but uses more characters of the statement and is also in base 32 numbering.  The SQL_ID looks a lot more like a string then a number, but it’s a number. 

Then in 10.2 we started to see PLAN_HASH_VALUE.  What is this?  

One of the goals of the folk writing the optimizer is to make it smarter and faster.  This cascades into the run time of the SQL.  If we can either parse or find the already parsed plan faster than we can run faster.  Also why have multiple plans that really are the same BUT have slightly different statements?  Wouldn't it be nice if two (or more) statements that “look different” but generate the same plan, just have ONE plan to use?   

Yes it would be, and that is where PLAN_HASH_VALUE comes in.  

Take a look at this very simple example:

SQL> @empno_test
SQL> alter system flush shared_pool;
System altered.
SQL> set echo on feed on
SQL> exec dbms_application_info.set_module('LAB1','EMPNO1');

PL/SQL procedure successfully completed.

SQL> select ename, job from emp where empno = 7902;
ENAME      JOB
---------- ---------
FORD       ANALYST

1 row selected.

SQL> exec dbms_application_info.set_module('LAB1','EMPNO2');

PL/SQL procedure successfully completed.

SQL> select ename, job from emp where 7902 = empno;
ENAME      JOB
---------- ---------
FORD       ANALYST

1 row selected.

SQL>
SQL> exec dbms_application_info.set_module(null,null);

PL/SQL procedure successfully completed.

SQL> set lines 10000
SQL> column action format a10
SQL> column executions heading EXEC format 9999
SQL> column version_count heading VERSIONS format 9999
SQL> column hash_value format 99999999999
SQL> column sql_text format a50
SQL>
SQL> select action, executions, version_count, hash_value, sql_id, plan_hash_value, sql_text
  2    from v$sqlarea
  3   where module = 'LAB1' and command_type != 47
  4   order by hash_value;
ACTION      EXEC VERSIONS   HASH_VALUE SQL_ID        PLAN_HASH_VALUE SQL_TEXT
---------- ----- -------- ------------ ------------- --------------- --------------------------------------------------
EMPNO2         1        1     77702346 ay1h1g42a396a      4120447789 select ename, job from emp where 7902 = empno
EMPNO1         1        1    116025303 1cvmsb83fntyr      4120447789 select ename, job from emp where empno = 7902

2 rows selected.

Notice each statement has its own unique HASH_VALUE and SQL_ID.  And both have the same PLAN_HASH_VALUE.   This means that although they are truly two different statements in the view of the optimizer, but they share the same plan.  And that plan is:

SQL> select /*+ gather_plan_statistics*/ ename, job from emp where 7902 = empno;
ENAME      JOB
---------- ---------
FORD       ANALYST

1 row selected.

SQL> select * from table(dbms_xplan.display_cursor());
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
SQL_ID  8nuj6ncwusvdt, child number 0
-------------------------------------
select /*+ gather_plan_statistics*/ ename, job from emp where 7902 =
empno

Plan hash value: 4120447789

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |     1 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP          |     1 |    18 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | EMP_EMPNO_PK |     1 |       |     0   (0)|          |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPNO"=7902)

20 rows selected.

A key point is that the two statements are DIFFERENT.  They will each take up some space of their own in the shared pool in the SGA, but the good news is they do share the same plan. 

The HASH_VALUE and the PLAN_HASH_VALUE are different numbers used differently.  The explain plan (see above) shows the PLAN_HASH_VALUE not the HASH_VALUE, be careful not to confuse the two.

Pretty cool, eh?