Friday, April 12, 2024

EXISTS vs EQUALITY - sand in the gears


The best thing about doing SQL Optimization is there is always something that needs attention. 

The worse thing about doing SQL Optimization is there is always something that needs attention.
 
A colleague in my team ran across some code that looked like this:

 

SELECT OWNER, OBJECT_NAME, OBJECT_TYPE
FROM big_tab bigtab
WHERE 1 = (  SELECT    1    FROM   allusers_tab alluserstab
      WHERE   bigtab.owner = alluserstab.username
            and rownum = 1   );
 
This has been changed to protect the innocent to a couple of tables in my test database, but the core of the code is the same.  It’s doing a select to see if there is a match between the two tables and returns a 1 if there is, and (this is important) using ROWNUM=1 to make sure it only gets one row.  (BIG_TAB has about 4 million rows and ALLUSERS_TAB has about 50, so the test is a bit more exciting than the good old EMP and DEPT tables.) 
 
 
I assume many of us SQL coders will say “Hey, just use an EXISTS operator!” And that is exactly what you should do and not this.  But does it really matter?  Is this just a different style? 
 
 
Yes, it does matter, and here is where the ROWNUM=1 becomes important.  
 
 
If you just change this to an EXISTS and leave the ROWNUM=1 in place, the plan really doesn’t change.   The core plan looks like this for the plan with 1 =  (bla) or EXIST (bla). 

 

 

--------------------------------------------------
| Id  | Operation                  | Name        | 
--------------------------------------------------
|   0 | SELECT STATEMENT           |             |  
|*  1 |  FILTER                    |             |
|   2 |   TABLE ACCESS STORAGE FULL| BIG_TAB     |
|*  3 |   COUNT STOPKEY            |             |
|*  4 |    INDEX UNIQUE SCAN       | USERNAME_PK |
--------------------------------------------------

 

 

If you change the code using an EXISTS and remove the ROWNUM = 1 like this (what is returned by the inner query doesn’t matter, that value is not used, just "did this return a row or not") :

 

SELECT OWNER, OBJECT_NAME, OBJECT_TYPE
FROM big_tab bigtab
WHERE EXISTS (  SELECT    1    FROM   allusers_tab alluserstab
      WHERE   bigtab.owner = alluserstab.username  );
 
And the plan looks like this:
 
 
--------------------------------------------------
| Id  | Operation                  | Name        |
--------------------------------------------------
|   0 | SELECT STATEMENT           |             |
|*  1 |  HASH JOIN                 |             |
|   2 |   INDEX FULL SCAN          | USERNAME_PK |
|   3 |   TABLE ACCESS STORAGE FULL| BIG_TAB     |
--------------------------------------------------

 

Which is a better-looking plan, but does it matter?  Turns out it does. 

I ran both of these 1000 times in a PL/SQL block capturing elapsed time and CPU time.  The difference is not huge but it is better to use the EXISTS without the ROWNUM = 1.    This is like sand in the gears.  This construct is unlikely to show up as the high nail that needs to be pounded down, but is a bit more friction in the run.  In the case of the statement my colleague found, this construct was used many times within the query.  Fixing this will help (at this time testing and such is still ongoing), but I don't expect this alone to be the thing that cuts the run time dramatically.   Maybe it will as it is used several times in the query. But it would be because there are several of them not just one. 
 
Some folks might be thinking, don’t I need the ROWNUM=1 to make sure only one row comes back for the EXISTS operator?
 
No.  The EXISTS operator will stop once a row is found, even if many rows could be returned, it stops after the first one. In effect the EXISITS operator is only looking at the return code so to speak, did this return a row?  Yes, then stop.
 
Here is the timing information for the 1000 runs. (The test was done on an always free autonomous database):

 

Doing the Equality check 1000 times: 

Times in hundredths of a second

**** TIME   - 25195

**** CPU    - 24653


Doing the EXISTS (without ROWNUM=1) 1000 times: 

Times in hundredths of a second
**** TIME   - 20090
**** CPU    - 19790
 
To make the test a bit easier I did warp each statement in a SELECT COUNT(*) FROM (bla) construct.  The plan for the statement did stay about the same, just the counter mechanism added on. 

 

= = = = = = = = = = = = = = = = = = = = = = = = = = = = =

 

Here is the code I used to do the timing test: 
 
SET SERVEROUTPUT ON 
 
DECLARE
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    Cursor test_cur01 is 
    select count(*) from (
    SELECT OWNER, OBJECT_NAME, OBJECT_TYPE
         FROM big_tab bigtab
         WHERE 1 = (  SELECT    1    FROM   allusers_tab alluserstab
                        WHERE   bigtab.owner = alluserstab.username
                           and rownum = 1   ) ); 
    Cursor test_cur02 is 
    select count(*) from (SELECT OWNER, OBJECT_NAME, OBJECT_TYPE
       FROM big_tab bigtab
       WHERE Exists ( SELECT  1 FROM allusers_tab alluserstab
      WHERE bigtab.owner = alluserstab.username));
   cur01_rec test_cur01%rowtype;
   cur02_rec test_cur02%rowtype;
BEGIN
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
       for i in 1 .. 1000 loop
       open test_cur01;
       fetch test_cur01 into cur01_rec;
       close test_cur01;
       end loop;
    DBMS_OUTPUT.put_line ('Times in hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME   - '||to_char(DBMS_UTILITY.get_time - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU    - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu)); 
        l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
       for i in 1 .. 1000 loop
       open test_cur02;
       fetch test_cur02 into cur02_rec;
       close test_cur02;
       end loop;
    DBMS_OUTPUT.put_line ('Times in hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME   - '||to_char(DBMS_UTILITY.get_time - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU    - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu)); 
END; 
/

Wednesday, January 17, 2024

A misunderstanding about the Materialize hint


I think I have found out why some folks are resisting using this hint when defining a CTE (Common Table Expression). There apparently is a misunderstanding about what this hint does. 


This doesn’t cause the CTE to be kept in memory (the PGA) after the query finishes.  This only forces the optimizer to not “merge” it back into the main query.  A materialized CTE will store the results set in a GTT (Global Temp Table) like structure during the run of the statement.  This GTT is dropped once the query finishes.  

 

 

I say again, the GTT is dropped and is not kept in the PGA once the query finishes. 

 

 

So, what does the hint do and why do I recommend using it pretty much all the time?

 

 

What it does is force the optimizer to keep the CTE as a separate unit and not merge (refactor) it back into the main query.   Generally merging the CTE back into the query is not what we want to have happen.  We just spent all this time and effort to write the code with a CTE and then the optimizer negates all that work.  

 

 

I can count on one hand the number of times that the materialize hint made a query worse. Sometimes it doesn’t matter, the optimizer will materialize a CTE with or without the hint.  For example, if the CTE is referenced more than one time, the optimizer will automatically materialize the CTE. 

 

 

One more time, with feeling, the GTT used during the run of a query for a materialized CTE is DROPPED after the statement finishes.  

 

 

I have several blog posts raving about all the wonderful things a CTE can do to make queries run better faster and stronger.  Please see these for ideas on how you can use them to speed up your code.