Wednesday, January 17, 2024

A misunderstanding about the Materialize hint


I think I have found out why some folks are resisting using this hint when defining a CTE (Common Table Expression). There apparently is a misunderstanding about what this hint does. 


This doesn’t cause the CTE to be kept in memory (the PGA) after the query finishes.  This only forces the optimizer to not “merge” it back into the main query.  A materialized CTE will store the results set in a GTT (Global Temp Table) like structure during the run of the statement.  This GTT is dropped once the query finishes.  

 

 

I say again, the GTT is dropped and is not kept in the PGA once the query finishes. 

 

 

So, what does the hint do and why do I recommend using it pretty much all the time?

 

 

What it does is force the optimizer to keep the CTE as a separate unit and not merge (refactor) it back into the main query.   Generally merging the CTE back into the query is not what we want to have happen.  We just spent all this time and effort to write the code with a CTE and then the optimizer negates all that work.  

 

 

I can count on one hand the number of times that the materialize hint made a query worse. Sometimes it doesn’t matter, the optimizer will materialize a CTE with or without the hint.  For example, if the CTE is referenced more than one time, the optimizer will automatically materialize the CTE. 

 

 

One more time, with feeling, the GTT used during the run of a query for a materialized CTE is DROPPED after the statement finishes.  

 

 

I have several blog posts raving about all the wonderful things a CTE can do to make queries run better faster and stronger.  Please see these for ideas on how you can use them to speed up your code. 



Tuesday, December 26, 2023

CTEs and UNIONs, Wrap it up!


CTEs (Common Table Expressions) can solve a lot of performance issues.  Maybe not all of them, but a majority of them.  

 

 

I ran into a situation recently that was quite frustrating.  A customer I was working with had created a rather large and complex CTE that was 3 select statements with the UNION ALL operation pulling them together.  I’m a huge fan of the /*+ MATERIALIZE */ hint for CTEs as this will keep the CTE as a separate entity.   Using this hint more times than not is best for performance. 

 

 

But this CTE wouldn’t get materialized.  It was in effect ignoring the hint.  

 

 

Turns out a CTE using UNION or UNION ALL will do this.  I didn’t find any official documentation on this, given that the MATERIALIZE hint is really undocumented anyway, this is not a surprise. 

 

 

A solution I found was to “wrap it” in another select. 

 

 

Here is an example query to illustrate the technique.  Yes, this is a silly query but it works for showing what I’m talking about. 

 

 

WITH loc_cte AS (

    SELECT /*+ MATERIALIZE */

        ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'NEW YORK'

    UNION 

    SELECT

        ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'DALLAS'

    UNION 

    SELECT

         ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'CHICAGO'

)

SELECT ename, sal, dname FROM loc_cte

ORDER BY sal, ename;

 

 

For this one the explain plan looks like below.  Notice there is not a materialized object for the CTE, the main query and the CTE have been merged together.  There are remnants of the CTE in here, but key is that the CTE is not being materialized. (The focus here is not what the optimizer did when it transformed the query. Maybe a topic for another day.)

 

 

--------------------------------------------------------------------------------------------

| Id  | Operation                          | Name               | Rows  | Bytes | Time     |

--------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                   |                    |    12 |   348 | 00:00:01 |

|   1 |  SORT ORDER BY                     |                    |    12 |   348 | 00:00:01 |

|   2 |   VIEW                             |                    |    12 |   348 | 00:00:01 |

|   3 |    SORT UNIQUE                     |                    |    12 |   414 | 00:00:01 |

|   4 |     UNION-ALL                      |                    |       |       |          |

|   5 |      NESTED LOOPS                  |                    |     3 |    99 | 00:00:01 |

|   6 |       NESTED LOOPS                 |                    |     5 |    99 | 00:00:01 |

|*  7 |        TABLE ACCESS STORAGE FULL   | DEPT               |     1 |    20 | 00:00:01 |

|*  8 |        INDEX RANGE SCAN            | EMP_DEPT_IDX       |     5 |       | 00:00:01 |

|   9 |       TABLE ACCESS BY INDEX ROWID  | EMP                |     3 |    39 | 00:00:01 |

|  10 |      NESTED LOOPS                  |                    |     9 |   315 | 00:00:01 |

|  11 |       NESTED LOOPS                 |                    |    10 |   315 | 00:00:01 |

|  12 |        VIEW                        | VW_JF_SET$5E11AB46 |     2 |    44 | 00:00:01 |

|  13 |         SORT UNIQUE                |                    |     2 |    40 | 00:00:01 |

|  14 |          UNION-ALL                 |                    |       |       |          |

|* 15 |           TABLE ACCESS STORAGE FULL| DEPT               |     1 |    20 | 00:00:01 |

|* 16 |           TABLE ACCESS STORAGE FULL| DEPT               |     1 |    20 | 00:00:01 |

|* 17 |        INDEX RANGE SCAN            | EMP_DEPT_IDX       |     5 |       | 00:00:01 |

|  18 |       TABLE ACCESS BY INDEX ROWID  | EMP                |     4 |    52 | 00:00:01 |

--------------------------------------------------------------------------------------------

 

 

 

Now the change to the plan is to wrap the CTE in a select.  I’d recommend you don’t use SELECT * for the wrapping select.  I’m not a fan of this syntax as it can cause problems.  

 

 

WITH loc_cte AS (

    select /*+ MATERIALIZE */ ename, sal, dname from (

    SELECT 

        ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'NEW YORK'

    UNION 

    SELECT

        ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'DALLAS'

    UNION 

    SELECT

         ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'CHICAGO'

))

SELECT ename, sal, dname FROM loc_cte

ORDER BY sal, ename;

 

 

And now the plan shows the CTE staying as a separate entity (it’s been materialized, Beam me up Scotty!). 

 

 

-----------------------------------------------------------------------------------------------------------

| Id  | Operation                                | Name                        | Rows  | Bytes | Time     |

-----------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                         |                             |    12 |   348 | 00:00:01 |

|   1 |  TEMP TABLE TRANSFORMATION               |                             |       |       |          |

|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_3FD9E6966_7A9C9340 |       |       |          |

|   3 |    VIEW                                  |                             |    12 |   348 | 00:00:01 |

|   4 |     SORT UNIQUE                          |                             |    12 |   828 | 00:00:01 |

|   5 |      UNION-ALL                           |                             |       |       |          |

|   6 |       NESTED LOOPS                       |                             |     3 |    99 | 00:00:01 |

|   7 |        NESTED LOOPS                      |                             |     5 |    99 | 00:00:01 |

|*  8 |         TABLE ACCESS STORAGE FULL        | DEPT                        |     1 |    20 | 00:00:01 |

|*  9 |         INDEX RANGE SCAN                 | EMP_DEPT_IDX                |     5 |       | 00:00:01 |

|  10 |        TABLE ACCESS BY INDEX ROWID       | EMP                         |     3 |    39 | 00:00:01 |

|  11 |       NESTED LOOPS                       |                             |     9 |   315 | 00:00:01 |

|  12 |        NESTED LOOPS                      |                             |    10 |   315 | 00:00:01 |

|  13 |         VIEW                             | VW_JF_SET$B1ADC4B5          |     2 |    44 | 00:00:01 |

|  14 |          SORT UNIQUE                     |                             |     2 |    80 | 00:00:01 |

|  15 |           UNION-ALL                      |                             |       |       |          |

|* 16 |            TABLE ACCESS STORAGE FULL     | DEPT                        |     1 |    20 | 00:00:01 |

|* 17 |            TABLE ACCESS STORAGE FULL     | DEPT                        |     1 |    20 | 00:00:01 |

|* 18 |         INDEX RANGE SCAN                 | EMP_DEPT_IDX                |     5 |       | 00:00:01 |

|  19 |        TABLE ACCESS BY INDEX ROWID       | EMP                         |     4 |    52 | 00:00:01 |

|  20 |   SORT ORDER BY                          |                             |    12 |   348 | 00:00:01 |

|  21 |    VIEW                                  |                             |    12 |   348 | 00:00:01 |

|  22 |     TABLE ACCESS STORAGE FULL            | SYS_TEMP_3FD9E6966_7A9C9340 |    12 |   348 | 00:00:01 |

-----------------------------------------------------------------------------------------------------------

 

 

And there you have it!  Here is the full script if you’d like to run it on your own.  Enjoy! 

 

 

explain plan for

WITH loc_cte AS (

    SELECT /*+ MATERIALIZE */

        ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'NEW YORK'

    UNION 

    SELECT

        ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'DALLAS'

    UNION 

    SELECT

         ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'CHICAGO'

)

SELECT ename, sal, dname FROM loc_cte

ORDER BY sal, ename;

 

set lines 4000

set pages 5000

select * from table(DBMS_XPLAN.display(format=>'advanced -cost -projection'));

 

 

explain plan for

WITH loc_cte AS (

    select /*+ MATERIALIZE */ ename, sal, dname from (

    SELECT 

        ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'NEW YORK'

    UNION 

    SELECT

        ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'DALLAS'

    UNION 

    SELECT

         ename, sal, dname

    FROM

        emp, dept

    WHERE

            emp.deptno = dept.deptno

        AND dept.loc = 'CHICAGO'

))

SELECT ename, sal, dname FROM loc_cte

ORDER BY sal, ename;

 

set lines 4000

set pages 5000

select * from table(DBMS_XPLAN.display(format=>'advanced -cost -projection'));

Thursday, May 18, 2023

Using the MATERIALIZE hint correctly



 I see queries like this from time to time:

 

SELECT /*+ materialize */ * FROM (SELECT bla-bla-bla…);

 

I’m pretty sure that the  MATERIALIZE hint is doing nothing when used in this way.  The MATERIALIZE hint is for CTEs (common table expressions) crated using the WITH clause.  I am unable to see what is does for any other use.  If someone can send me a repeatable test case that I can run showing it does something in other cases I’m open to being proven wrong. 

 

This is the type of statement where the hint works as expected.  This creates a structure that is similar to a Global Temp Table, this structure is dropped after the statement completes. 

 

WITH emp_dept_cte AS (

    SELECT /*+ materialize */ empno, sal, dname

    FROM

        emp,dept

    WHERE

        emp.deptno = dept.deptno

)

SELECT * FROM emp_dept_cte; 

 

But it doesn’t work when I write something like this:

 

Select /*+ materialize */  * from (SELECT empno, sal, dname

    FROM

        emp, dept

    WHERE

        emp.deptno = dept.deptno )

 

In this query no matter where I put the MATERIALIZE hint, no temp table like object is created and used.  


In the plan for the first query above I can clearly see the temp table being created in the auto trace with the step LOAD AS SELECT and an object name that starts with SYS_TEMP_ followed by an apparent random combination of numbers and letters. (I’m pretty sure it’s a hexadecimal number for the internal object.)



 















Using the MATERLIZE hint when not using the WITH clause, I do not see this step in the plan.  Hence no temp table like structure.

 

 

Have a great day! 

 

 

Sunday, April 30, 2023

Optimizing your SQL starts before you write any code


 Writing good SQL is more about how you think than what you write.  The problem that most of us have is that we think in terms of individual rows when we really need to think more in terms of sets of rows.   

 

 

Of course, this is natural for us to do since the end product is the individual rows, and we don’t necessarily think of them as a set.  But they are.  They are a set that satisfies the conditions that the business defines for the data they want. 

 

 

 We need to think this way always when writing and optimizing our SQL.  Too often we write code that is very much “row-by-row” (or as Tome Kyte would say “row-by-slow”).  The classic example is the correlated subquery in the SELECT list.  This will return a row for each row defined by the calling (outer) query.  The optimizer has a neat trick called subquery caching where it will cache values for “repeating” values coming into it.  That is great and the subquery can run less frequently.  

 

 

However, this isn’t always possible. If the values being passed in are relatively unique the saving by this can be minimal, and if totally unique it doesn’t help at all.  Also, it only can cache so many values (version dependent) and there is a problem of HASH collisions as it uses a hash algorithm to store the values in a hash table.   Bottom line, don’t count on this to save your performance. 

 

 

Better to not have correlated subqueries in the select list.  Of course, ones that run for a very few distinct values can be fine.  What I talking about is the ones that run thousands or millions of times.  Just making these run faster isn’t really solving the problem either, you’re just kicking the can down the road.  As the data volume increases (and it always does) this will eventually not be fast enough. 

 

 

I’m a huge fan of CTEs (Common Table Expressions).  I’ve written several posts about these.  In my experience CTEs can solve something like 80% of performance problems.   And for the correlated subquery they are the best solution. 

 

 

This is also thinking  in terms of sets of rows (values), not one at a time.  For that correlated subquery, how can I get all the possible values I would need for the subquery all at once and not one at a time?  The CTE is the perfect solution.   This isn’t the only way they can be used to help performance, but it is one of the most powerful. 

Sunday, March 12, 2023

Looking back after 21,915 days.


This March I hit the milestone of turning  60.  It’s been quite a ride.  And it seems appropriate to pass on some thoughts on life to others.

 

 

I tell my kids 3 things that they should do in life.  These are not hard or difficult to do, it’s the doing them that is the hard part. 

 

 

Save more Money.  Seems simple enough and we’ve all heard variations on this likely throughout our lives.  The issue is you don’t really understand how important this is until you are in your older years.  Like I am now.   Retirement is all about being able to afford it, it has very little to do with how old you are.   The money you have saved, be it 401Ks, IRAs, savings accounts and so on, is key.  It doesn’t take a financial genius to know the more money you have, the more options you will have when the time comes. 

 

 

Don’t change Jobs/Companies too much.  As I look back, I did stay in my career path all along, but I believe I may have switched companies more than I should have.  This is a tough one of course, there can be very good reasons to move. This also reflects  on my first bit, saving money.  Staying with a company that provides retirement benefits for longer can mean more money saved for the retirement time.  I moved to some small companies over time and the immediate pay increase was good, however in the long run I ended up behind the curve for this thing called retirement.  Of course, had I put away more of the increase of money, then I would be in a better place.  Also, things are quite different now.  When I first started working, there were these things called pension plans.  Those are pretty much like the dinosaurs, now it’s all about managing a 401K or like which can be portable. 

 

 

Keep in contact with folks.  The folks I kept in contact with did help me in many ways over time, and I hope I have helped them in return.  I do wonder about some of the folks that I didn’t keep in contact with, if things might be better for both them and me.  We are herd animals after all, so getting thru this thing called life, we need others to help us along.  Having folks to chat with about life and work events is a great thing.  You never know when a call to a friend can help lift them (and you) up out of a bad moment in life. 

 

 

One thing I will add to my advice for my kids is to keep yourself physically fit.  This is more about stamina and not so much about being ripped.  Like being able to walk a good distance at a good pace and not being exhausted.  For me at 60 I shoot for something like 15-18 minutes for a 1 mile walk.  A case in point, I have recently been helping my 82-year-old father thru a surgery and I can see that his lack of physical fitness (stamina) has made his recovery much more difficult.  

 

 

Another seemingly small thing, have a hobby you can do when your older and can’t move about as well as you did when you were younger and more agile.  It’s great to have a physical hobby like jogging or playing a sport.  But what happens if you are bed ridden for a while; recovering from an injury or a surgery? Like my dad is right now.  Recovery time is longer the older you get; shocking I know.  Or you just can no longer do that physical activity anymore?  Something like reading, building a puzzle, sudoku, or crossword might be all you can do.  Playing games is also great.  There are thousands of great games out there from large strategy games to small adventure games.  These can be card based, hex and counter, dice, miniatures and all kinds of different styles.  Many can be played solo and with other folks.  There are a lot of themes for these games as well, war games (historical and fictional), racing, sports, firefighting, fantasy, science fiction, trains, super heroes, and so on.  Yes, I’m a huge gamer.  I have more games than a rational person should have.  But I’m hopeful that when I get to retirement, I’ll have the time to play these games.   The picture in this post is most of my games, but not all.  

 

 

I hope those of you who read this thru will take these words of advice and fit them into your life in the way that works for you.  

Wednesday, November 2, 2022

Automatic Plan Management – not the cure all.


There has been a decent amount of buzz out there about Oracle’s Automatic SQL Plan Management.  It’s not exactly new, this feature has its roots in 11.  Starting in 19 for Autonomous databases it is on by default.   Which isn’t necessarily a bad thing.

 

My concern is that folks may make a bad assumption about how this works.  

 

The actually assumption that this feature works on is that the code is well written to begin with.  This isn’t a mechanism  that will be able to rewrite code to make it better, it will just make sure that the plan used for the code doesn’t get worse.  Those are two very different things.  

 

A good example would be code that doesn’t scale well.  The classic example being correlated subqueries in the SELECT LIST (and the WHERE clause can be worse, just not as common).   Any correlated sub query has the potential to run for each row returned by the main query.  If the main query is only returning a relatively small set of rows, these types of queries can run well.  (Yea this is a topic I talk about a lot, because I see it a lot in queries that run poorly.) 

 

However, as the data grows these queries will by nature run slower.

 

Automatic Plan Management can only do so much.  It can try to get the best plan for the code as written.

 

Also, this sort of automation tends to be reactionary, things have to run badly before it knows it’s bad.   An analogy would be driving to a reasonably far away destination.  If you and I were about to get on a highway and see that it is backed up, we could avoid getting on the highway and driving a slightly different route that still keeps on relativity on time to the destination.  The optimizer would just get on the highway (following the plan) and get to the destination late and then realize that was a bad idea.  

 

The bad assumption I referred to earlier that folks might make, is that this feature will make up for poor code.  It can’t.  Bad code will still have a less then optimal plans and this feature will do it’s best to try to use the best plan for it, but it will never be a good as a well written statement.   I use the joke that a car designed to run on square wheels will never run well, no matter how good the suspension and how powerful the engine is, it will be ruff ride and be highly inefficient.  

 

With a well written (and scalable) statement to start with, yes, this feature is likely to help.  But with poor statements to begin with, this may just add a layer of complexity to figuring out why the performance is so bad. 


But with enough beer, you might not notice there is a problem eh?  

Wednesday, August 17, 2022

REGEXP_LIKE and LIKE


I don’t use REGEXP much but ran across this interesting issue just recently.  

Apparently REGEXP_LIKE doesn’t like parentheses in search strings, the LIKE operator doesn’t have an issue with parentheses.  

The full code will be at the bottom of this if you’d like to try it yourself.  I did run this on a super-cool Autonomous Database, version 21.3 so it’s not an old version thing.

What I do in the test is compare a column form one table to another, this is simulating the problem that I ran into recently.  In the string are some parentheses.  And you’ll notice that when these values are in the first string in the REGEXP_LIKE operation, it fails.

The Right-side data will be this (Table called regextest01, yea I was not very creative with my table name for this):

'ABCD(XXX)101',  'ABCD(xxx',  'ABCD)xxxx'

Then the Left-side will be these values one at a time (Table called regextest02, yea I was not very creative with my table names for this example):

'ABCD',  'ABCD(xxx)',  'ABCD(',  'ABCD)'

The first one as you expect works fine for both the REGEXP_LIKE and the LIKE, returning all three matching rows. 

The second one there is no error for REGEXP_LIKE however it returns no rows. The LIKE returns the one you would expect with the value in the first table of 'ABCD(XXX)101'.   Hum that’s odd, it appears that the parentheses inside the string for REGEXP_LIKE are not treated as part of the string but as some sort of parameter or something.  Very odd to me at least.

And the last two error out with:

Error report -

ORA-12725: unmatched parentheses in regular expression

For the REGEXP_LIKE, but work just fine with the LIKE.

Interesting.   So, for me at least, just one more reason to not use REGEXP_LIKE.  Clearly the parentheses are a problem for REGEXP_LIKE and not an issue for LIKE. 

In the script below each comparison is done one at a time.  Just to make it easier to see how each comparison works.  Full code here so you can run this and see it live and in color, well maybe not much color:


/* test code of REGEXP_LIKE testing for parentheses */

CREATE TABLE regextest01 (

    id      NUMBER,

    string01 VARCHAR2(20)

);


CREATE TABLE regextest02 (

    id       NUMBER,

    string02 VARCHAR2(20)

);


insert into regextest01 values (11, 'ABCD(XXX)101');

insert into regextest01 values (22, 'ABCD(xxx');

insert into regextest01 values (33, 'ABCD)xxxx');

insert into regextest02 values (11, 'ABCD');

prompt Value in regextest02: ABCD

prompt Using REXEXP_LIKE:

select * from regextest01 rt01, regextest02 rt02 

where regexp_like(rt01.string01, rt02.string02, 'i');

prompt Using LIKE:

select * from regextest01 rt01, regextest02 rt02 

where upper(rt01.string01) like '%'||upper(rt02.string02)||'%';

/* Change the comparison string to have an opening and closing parentheses in it*/

delete from regextest02;

insert into regextest02 values (11, 'ABCD(xxx)');

prompt Value in regextest02: ABCD(xxx)

prompt Using REXEXP_LIKE:

select * from regextest01 rt01, regextest02 rt02 

where regexp_like(rt01.string01, rt02.string02, 'i');

prompt Using LIKE:

select * from regextest01 rt01, regextest02 rt02 

where upper(rt01.string01) like '%'||upper(rt02.string02)||'%';

/* Change the comparison string to have an opening parentheses in it*/

delete from regextest02;

insert into regextest02 values (11, 'ABCD(');

prompt Value in regextest02: ABCD(

prompt Using REXEXP_LIKE:

select * from regextest01 rt01, regextest02 rt02 

where regexp_like(rt01.string01, rt02.string02, 'i');

prompt Using LIKE:

select * from regextest01 rt01, regextest02 rt02 

where upper(rt01.string01) like '%'||upper(rt02.string02)||'%';

/* Change the comparison string to have an closing parentheses in it*/

delete from regextest02;

insert into regextest02 values (11, 'ABCD)');

prompt Value in regextest02: ABCD)

prompt Using REXEXP_LIKE:

select * from regextest01 rt01, regextest02 rt02 

where regexp_like(rt01.string01, rt02.string02, 'i');

prompt Using LIKE:

select * from regextest01 rt01, regextest02 rt02 

where upper(rt01.string01) like '%'||upper(rt02.string02)||'%';

drop table regextest01 purge;

drop table regextest02 purge;