Thursday, January 24, 2013

Not In vs Not Exists

My last post was about IN and EXISTS but what about the NOT versions of these?  Are they the same also?  Nothing bets a test so let's take a look.

This time around I want to find out if there are any departments without any employees assigned.  First let's give it a go with the NOT IN.

select * from dept where deptno not in (select deptno from emp ) ;

This seems like a good idea, basicly we are saying give me all the dapartments that are in the DEPT but not in a list of departmets from the EMP table.  We get this results:

no rows selected

Which looks fine, we think there aren't any.   But what if I change it to a NOT EXISTS.

select * from dept d where not exists (select deptno from emp where deptno = d.deptno);

now I get:

         DEPTNO DNAME          LOC
--------------- -------------- -------------
             40 OPERATIONS     BOSTON

1 row selected.


Well that's not very nice!  It looks like I got wrong results here somewhere.  And I did.  The first one was incorrect, why?  Let's take a look at the plans for each, the NOT IN gives me this plan:

--------------------------------------------------------
| Id  | Operation                     | Name           |
--------------------------------------------------------
|   0 | SELECT STATEMENT              |                |
|   1 |  MERGE JOIN ANTI NA           |                |
|   2 |   SORT JOIN                   |                |
|   3 |    TABLE ACCESS BY INDEX ROWID| DEPT           |
|   4 |     INDEX FULL SCAN           | DEPT_DEPTNO_PK |
|*  5 |   SORT UNIQUE                 |                |
|   6 |    TABLE ACCESS FULL          | EMP            |
--------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("DEPTNO"="DEPTNO")
       filter("DEPTNO"="DEPTNO")


The NOT EXISTS gives me this plan:

-------------------------------------------
| Id  | Operation          | Name         |
-------------------------------------------
|   0 | SELECT STATEMENT   |              |
|   1 |  NESTED LOOPS ANTI |              |
|   2 |   TABLE ACCESS FULL| DEPT         |
|*  3 |   INDEX RANGE SCAN | EMP_DEPT_IDX |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("DEPTNO"="D"."DEPTNO")
       filter("DEPTNO" IS NOT NULL)




What gives?  Why these to different plans and what is that "ANTI NA" thing in the NOT IN plan.  It's a logic thing that is the culprit. With the NOT IN if there is a null in the inner set the entire set is nullified.  In the EMP table, the employee King (the president, what a great name for the company president don't ya think?) is not in a department so his DEPTNO is NULL that ends up nullifying the entire set and we get no results. 

The "NA" in the join step stands for "Null Aware", which kicks in this logical rule that once there is a null in there, it's over.


So what to do?  Well if you want to use NOT IN, then make sure you do something about the nulls. Like this:


select * from dept 
where deptno 
not in (select deptno from emp where deptno is not null) ;


Now you get the same results and teh exact same plan as NOT EXISTS.  Try and see for yourself.

So IN and EXISTS are really the same, and NOT IN and NOT EXISTS can be, so long as you do something about those nulls.


One question that this query is able to answer is a question that has plagued many Oracle professions for many years, why is it that is company has never made any money?  We now know, there is no one in the operations department!



Friday, January 11, 2013

In vs Exists?

Does using In or Exists make a difference in how a query is executed?  Let's take a look.

The query will count on suboject_names there are for a match on object_id between big_tab and small_tab. We'll take a look at these two ways to write the query:

select count(b.subobject_name)
  from big_tab b
 where exists ( select null from small_tab s where s.object_id = b.object_id )


select count(subobject_name)
  from big_tab
 where object_id in ( select object_id from small_tab )


The test is really simple, in SQLPlus we'll use the AUTOTRACE feature to see what happens for each query.  I've run each statement a couple of times to warm up the buffer cache and to get all the hard parsing taken care of. 

Here are the results from AUTOTRACE:

SQL> set autotrace on
SQL> get ainex1.sql
  1  select count(subobject_name)
  2    from big_tab
  3*  where object_id in ( select object_id from small_tab )
SQL> r
  1  select count(subobject_name)
  2    from big_tab
  3*  where object_id in ( select object_id from small_tab )

COUNT(SUBOBJECT_NAME)
---------------------
                    0

Execution Plan
----------------------------------------------------------
Plan hash value: 2588413792

------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |     1 |    11 |  5836   (3)| 00:01:11 |
|   1 |  SORT AGGREGATE        |           |     1 |    11 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI |           | 16000 |   171K|  5836   (3)| 00:01:11 |
|   3 |    INDEX FAST FULL SCAN| SMALL_IDX |   500 |  2000 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL   | BIG_TAB   |  2302K|    15M|  5805   (2)| 00:01:10 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"="OBJECT_ID")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      32777  consistent gets
      32768  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        519  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> get ainex2.sql
  1  select count(b.subobject_name)
  2    from big_tab b
  3*  where exists ( select null from small_tab s where s.object_id = b.object_id )
SQL> r
  1  select count(b.subobject_name)
  2    from big_tab b
  3*  where exists ( select null from small_tab s where s.object_id = b.object_id )

COUNT(B.SUBOBJECT_NAME)
-----------------------
                      0

Execution Plan
----------------------------------------------------------
Plan hash value: 2588413792

------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |     1 |    11 |  5836   (3)| 00:01:11 |
|   1 |  SORT AGGREGATE        |           |     1 |    11 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI |           | 16000 |   171K|  5836   (3)| 00:01:11 |
|   3 |    INDEX FAST FULL SCAN| SMALL_IDX |   500 |  2000 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL   | BIG_TAB   |  2302K|    15M|  5805   (2)| 00:01:10 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("S"."OBJECT_ID"="B"."OBJECT_ID")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      32777  consistent gets
      32768  physical reads
          0  redo size
        540  bytes sent via SQL*Net to client
        519  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> set autotrace off


What do we see?  The two plans are identical and both did the exact same amount of work (same number of consistent gets).  Will it make a difference if you us IN or EXISTS?  Likely not.  There was a time that it did (version 8 and earlier I think), but today it's very likely you will end up with the same plan.  If you think it matters, test it!  You might be surprised that it doesn't.