Wednesday, January 13, 2021

Getting WITH it in the cloud or not


The WITH clause in Oracle is used to create CTEs, Common Table Expressions also called Sub-Query Factors.  Oracle as a company hasn’t promoted these nearly enough from my point of view, it’s been available since 9.2.  Which is quite a while ago and still many folks are unaware of it.  This little post is to give CTEs a of bit nudge into the light. 

 

So, what is this “CTE” thing?  It’s easiest to think of it as a temporary view that is useable for the duration of the query.  Once the query is finished, they disappear.  Even the syntax is like a view definition:

 

For a view:  CREATE VIEW my_favorite_view AS (select …

 

For a CTE: WITH my_favorite_cte as (select …

 

Generally, I recommend using the MATERLIZE hint for CTEs:

 

WITH my_favorite_cte AS (select /*+ MATERIALIZE */ … 

This hint forces the optimizer to not merge the CTE back into the main query.  Typically, you just did all this work to get it out of there and you don’t want it going back in.  Of course, testing is a great idea, test with and without the hint to be sure you are getting what you want. 

 

Once the CTE is created, use it like you would any table in your query.  You can select from it or join it into a query just as you would any table.  You can even select from an earlier defined CTE in a later one, but you can’t refer to one that hasn’t been defined yet.  No “forward referencing” here. 

 

You create these because they can be very powerful for query optimization. Amazingly powerful.  Here are some reasons to use them. 

 

The most notable cases are when a sub-query within a query is used many times.  Recently I posted about the use with correlated sub-queries in the select list.  Read the post here.  This can be very dramatic in reducing the time and resources used by a query.  

 

Another common case is a subquery that reused many times but  is “slightly” different each time.  For example, a join of a set of tables and the WHERE clause limits it each time on the same set of columns, just different values.  A CTE that gets the super set of the rows used throughout the query can be made, then select from that each time just getting the rows needed for a particular part of the query.  

 

The savings is that the query getting the super set runs only one time, and that set is stored in memory.  Then as it’s joined into each part, only get from it the rows needed.  This is especially good when the set of rows in the super set is much smaller than the total size of all the rows in the joined set of tables.   For example, there are 5 tables in the subquery, total rows of hundreds of millions, but the super set is only tens of thousands.   There is no set number of rows or ratio for this, test to find what works and what doesn’t. 

 

 

Queries using the UNION operator commonly have subqueries that are the same in each part.  Here is another great use of CTEs.  Take the common query and make it a CTE, now just refer to it in each of the UNION sections.

 

You might even use them just to make the query easier to read.  Breaking up the query in to “steps” with the CTEs can make debugging and modification easier later.   This is where maybe the MATERIALIZE hint isn’t needed.  The code might run just fine after the Optimizer smashes it all back together into one query.   

Some personal recommendation on using them.

 

Please give them intelligent names.  Calling them T1, T2 and so on is very painful for debugging.  To further help in reading the query, name them with _CTE as a suffix.  This makes it easier to see what are tables and what are CTEs in large statements.  

 

These next two are meant to keep memory use down.   Each CTE when MATERIALIZED, which more often is what you want, becomes a table like structure in your PGA.  Some guidelines to consider when creating CTEs: 

·      Keep the number of them down.  If you’re creating more than about 6-8 CTEs make sure they really do the trick.  

 

·      Also try to have CTEs that have a “few” number of rows in them.  About a million or less is great.

 

There isn’t really a hard limit of the number of CTEs and the number of rows in one.   Maybe there is a hard limit on the number of CTEs but it you find out what it is, you likely have other problems already.  The point with keeping the number and size down is to conserve memory.  

 

On the other hand, if you have 16 CTEs and the query goes from hours to seconds, then it’s memory well spent.   Same for size, a 500 million row CTE that cuts the run time in a like manner is worth the memory as well.  Over all you might end up uses less memory then when the query runs for hours when exceeding my guidelines purposed here.  Test, test and test again. 

 

It’s time to get WITH it and use CTEs in your queries. 

Tuesday, December 22, 2020

Tracing in the cloud or not

 


Here is a link to a presentation I did about Oracle Trace events 10046 and 10053, on December 21st.  Enjoy!

Click here or copy paste the link below. 

https://www.youtube.com/watch?v=5x8pnMaepxM&feature=youtu.be


And have some pie.  

Tuesday, December 15, 2020

Correlated Sub Queries in the cloud or not


Bottom line up front:  Correlated Sub Queries (or sub-selects) are really bad for performance.  

 

You’re writing a query, and in the result set you need the MAX(ORDER_DATE) based on the customer information in each row being returned.  

 

Ah easy-peeze!  I’ll write a sub-select in the select list to do something like this:

 

custord.customer_number,

(select max(order_date) from customer_orders custord2 

where custord2.customer_id = custord.customer_number) LAST_ORDER_DATE,

 

Great! It works and life is good.  No need for a nasty “group by” thing.  Yea, I’m the man!

 

Or am I?  How often is that sub-select going to run?  Once for every row returned by the main query.  That could be easily thousands or millions depending on what you are doing.  This one is likely pretty harmless, especially if it’s the only one and it can do a unique look up on an index this one you might not even notice it’s a problem, as long as the data set is relatively small. But it wouldn’t scale well.  

 

How about a 4- or 5-way table join for the sub-selects, or more?  I recently had one that was a 12-way table join (Turned out that at least 4 for the tables weren’t needed when we look into it, but still an 8-way join is a lot to do in a sub-select like this.) 

 

Below is part of a SQL monitor plan of a query that had 16 correlated sub-queries in the select list that I recently worked on for a client.  Each one ran rather fast each time, but each one was running 9,794 times and the total run time was over 10 minutes.  Which to bring back only 9,794 rows, that seemed like a lot.  Notice that the number of rows returned and the times each one ran is the same.  You can see this in the EXECITION column compared to the actual rows returned.  Notice that the first one didn’t return a row for each input row.  This was the nature of that sub-query, there was a pretty good chance it wouldn’t return a row each time.  

 

This screen shot only shows a few of them.  Most were a 3 or 4-way table join.  I’ve hidden the full plan for each sub-query, the plan it used was fine.  It was the excessive runs that was the issue. 

 

 A semi-side point, when looking at this output below notice the estimated rows is 1 (one) for each of the sub-queries.  That estimate is correct.  For each run it returned at most 1 row.  The actual rows returned is higher because that is the total for all runs.  The estimate is a “per run” estimate, and is correct.  This is an excellent example of why actual run time stats is necessary for performance optimization.  You might look at the explain plan and think that the one row is just fine.  In reality that is very misleading.  





So, what to do?  I rewrote the query to have a CTE (Common Table Expression, using the WITH clause, also called a Sub-Query Factor) for each sub-query.  I was hoping to be able to combine some to reduce the number of them but that wasn’t possible from what I could see of the queries and the data.  Maybe some could be but it wasn’t obvious from the code.

 

Now I joined each CTE into the main query just like any other table.  Most of the sub-queries were joined on 2 columns and I did join them all in as LEFT OUTER JOINS because the first one I knew for sure wouldn’t return a row for each row in the main query to be safe I did that with all of them.  I was running out of time on this gig so I didn’t get to test with INNER JOINs, but I did let them know that they would likely be able to convert most of the joins to INNER.  

 

This query was running in 33 seconds after this change.  From over 10 minutes to 33 seconds, yea I call that a win.  And with this change the time to run is going to very slowly go up.  For the originally it will go up sharply as the number of rows returned by the main query go up.  In short, the new code is scalable, the old code was not. 

 

Key point I used the MATERILIZE hint on all the CTEs, each CTE was only referenced once in the query so I didn’t want it to be merged/unnested into the main query.  I find that the MATERILZE hint should be used nearly all the time when creating CTEs.  There have been times that I didn’t need it because the optimizer materialized it anyway (typically when you use it more then once).  And a few times when merging in to the main was a good idea.  Test both way when you are working with CTEs. 


What has this got to do with a fire in my fireplace?  Maybe it's you're "burning CPU LIOs and time" when you do these correlated sub-queries.  Or maybe it's just that I like the picture.  

 

Monday, November 2, 2020

Defaults in the cloud or not


 Use the defaults for nearly all the parameters.  

 

I was working on a gig recently and they had the rather interesting situation where queries were running faster in production then in the test environment.  This had started happening rather recently so the question was, why?  

 

Upon investigation, the key thing was the stats were different in the two systems.  I ran 10053 (optimizer trace) events on a sample query on both systems and it was clear that the differences in stats made the two plans different.  The join orders were different and use of indexes, etc.  The stats weren’t a lot different between the two system, but (generally) the tables in TEST were slightly bigger than in PROD. 

 

This was because they were testing new stuff in test.  It certainly makes sense to do this of course, but the issue was since the two systems were different, the plans could be (and were) different.  And they were different enough that the test query I was provided ran in 6 seconds on PROD and 15+ minutes on TEST.  That’s a pretty huge difference.  

 

My first stab at changing the query was to rewrite with a couple of CTEs (Common Table Expressions use the WITH clause).   I isolated two subqueries that were hierarchal with the good old START WITH/CONNECT BY PRIOR syntax.  I noted that these were being evaluated in different parts of the plan between the two so I thought maybe isolating these as CTEs would help.   And it did.  The plan still was a little different between the systems but now both were running in about 6 seconds.  Win!  

 

But that wasn’t really the root problem. I kept digging around to see what else could be a cause.  Since telling them to rewrite all their code with the “right” CTEs might be difficult at best.  And this is where something in the 10053 trace came in real handy.  

 

Near the top of the 10053-trace file is a section on the parameter used by the optimizer.  This section alone can be a very insightful on what is going on.   The sections start with this:

 

***************************************

PARAMETERS USED BY THE OPTIMIZER

********************************

 

And has 2 parts one starting with:

  *************************************

  PARAMETERS WITH ALTERED VALUES

  ******************************

 

The other with:

  *************************************

  PARAMETERS WITH DEFAULT VALUES

  ******************************

 

As the names suggest, they list parameters that have either changed values or are using defaults.  The list of parameters in the ALTERED VALUES should be quite short, the fewer the better.   They didn’t have a lot, which was very good.  But they did have two that really should be defaults today on both PROD and TEST:

 

optimizer_index_cost_adj

optimizer_index_caching             

 

These were common to change back in version 10 when the default cost model went from IO to CPU.  In 9 the default cost model was IO and in 10 the default became CPU which is what is used today.  Back then many shops had issues when first upgrading to 10 that indexes were not getting used appropriately.   Changing these parameters to non-defaults helped.  

 

But that time is over and the CPU model has been adjusted so these parameters should be the defaults.  The defaults are 100 for the cost adjust parameter and 0 for caching.  This means cost the plan as is, don’t increases the cost (a setting over 100) and don’t discount the cost (a setting under 100).  And for the caching parameter it means don’t assume any of the index is cached (in the buffer cache). 

 

I tested the sample query with following hints.  NOTE: Using the OPT_PARAM hit is great for testing; it should not be used for production code:

 

/*+ OPT_PARAM('optimizer_index_cost_adj' '100') OPT_PARAM('optimizer_index_caching' '0')   */

 

And running the original code with these hints on PROD and TEST the query ran in about 6 seconds on each.   Better Win!  The reason this is better is that now it appears that without changing/rewriting the code they can get the performance they expect.  More testing will need to be done by the client to verify this, but the testing that was done while I was on the gig looked quite good. 

 

But why did this work?  I believe that the differences in the stats was just enough to get exaggerated by the non-default parameters to make for different plans that caused poor performance.  The plans were still very slightly different in the two systems, but not enough to cause a performance issue.   

 

The key to this story is, use the defaults Luke, (substitute your name in for Luke).  Oracle does all of its internal testing with default settings for the parameters.  Once you venture into the land of using non-defaults you are in the space of untested.  Clearly to overcome a bug or a particular situation a non-default can be needed.  Once that situation or bug has be fixed, get back to the defaults. 

Wednesday, August 26, 2020

Cardinality in the cloud or not

I’ve been teaching and doing SQL optimization for many years now. Given a statement that is 5 lines or 500 lines long, the reason an execution plan is not optimal really comes down to the same thing every time. 

The cardinality calculation is off. 

The key is that it should be in the same magnitude as the actual rows returned. Not the exact same value, that’s cool when it happens but not necessary. The value for cardinality is an estimated number of rows, so it’s expected to not be exact. But when it’s way off, either up or down, things go badly. This is when the optimizer picks to use an index, a full scan, a join method, or a join order that causes the plan to go horribly wrong. 

Recently I was at a company that was moving very large queries from another system to Oracle Exadata. These queries were large in just about every sense of the word. The queries were hundreds of lines long many times and some worked with billions of rows in as well. When the cardinality would go wrong everything after that step in the plan just got worse and worse. It wasn’t uncommon for us to fix that sort of thing and take a query from never finishing to completing in a few minutes or even seconds. 

What kinds of things did we do to fix this? Mostly it had to do with changing the code of the query. It wasn’t common that the stats were wrong. And even if they were all we could do was recommend to the DBA team that the stats appeared to be off and hope they would fix it. 

Mostly it was rewrites of the query. 

 One problem was the over use of OUTER joins. Sure, they are needed, but the folks writing these queries tended to use them all the time. This did a few things; one it more or less forced a join order and it didn’t help the cardinality estimate many times. I fixed many queries but changing OUTER joins to INNER joins. The query got the same result set, of course if it didn’t that isn’t a fix! And could run many magnitudes faster because it would get a better join order and/or get a better cardinality. 

Another problem we banged into was a bug with hybrid histograms. This would kick in mostly with very large sets of data. The histogram would lead the optimizer to think it was getting back a very large set of rows many times and very poor decisions would be made because of that. The customer was still on version 12.1 and I’m pretty sure that bug has been fixed. 

 If you notice that columns with a hybrid histogram is giving very bad cardinality estimates you might want change that. How? Of course, you could drop the histogram (really you can’t, you just recollect stats without creating the histogram on that column), we couldn’t do that. We would typically modify that column in the predicate so it couldn’t use the histogram. Mostly we used NVL or COALESCE on the column. I prefer COALESCE myself because it will work on a column even if it’s a primary key, NVL wouldn’t. 

Sometimes it was expressing the predicate(s) in a different way. The simpler a predicate is the less likely a mistake will happen with the math. This typically requires really understanding the question being asked by the query. If you didn’t write the code, then you’ll likely need to chat with whoever did to find out what is going on. 

 Other times we had to resort to using the CARDINALITY hint to get the optimizer to do the right thing. I talked about this in my post "Hints in the cloud or not" this past March. You might want to check that post out too. It’s a quick read. 

The key with optimizing a query is getting that CARINALITY estimate to be in the right magnitude. Walk the plan “up and out” to find were it goes wrong. Stop there and fix that. Which table and predicates are driving the poor estimate? Really did into what that bit of code is doing, can it be rewritten another way? Sometimes one tiny change to a huge query and all is good. Other times it’s an iterative process, make a change, run it, make another and so on. 

Yes, it’s a lot harder to do this then to add more memory or CPU or other resource. But if you don’t fix it and just “add hardware” to make it go away, all you really did was kick the can down the road and it’s going to come back.

Monday, July 27, 2020

The Rise of SQL Optimization in the Autonomous Age in the cloud or not

In the world of IT, the age of Autonomous computing in upon us.  I’m an Oracle guy by trade and Oracle is very much all about Autonomous.   Machine Learning is the latest buzz in the profession and that has a lot of folks thinking that tuning and optimization will be completely in the hands of the autonomous database.

 

I tend to disagree.  I believe the need for folk who can write optimal code will be just as needed as we embark on this autonomous path as there ever was, maybe more.  

 

The reason for this is rather simple.  Someone writing code has a purpose for the code, a result set they are interested in seeing.  The better they know the data they are working with and the better then know what they are looking for, the better they can write something to get the results.  This is why many times in my optimization of code I need to talk to the author about what is it they are trying to get in order to find a more optimal way to get the results they want.

 

Of course, I and the autonomous database can do things like throw indexes at it or use hints/tricks to get the code to do this or that.  But maybe the code is just not the right code to begin with.  An outer join that doesn’t need to be, a subquery that really isn’t returning anything useful for the final result set, and maybe it’s just returning data that isn’t even used. 

 

When a query is only tens of lines long, some of these things can be seen with enough looking and testing sometimes.  But when the query is hundreds of lines long and with an execution plan with 1000s lines, it’s difficult for someone who doesn’t know the “why” of the SQL to be able to really optimize it. 

 

This is where the autonomous database will have some trouble optimizing code, if it can do anything at all. Sure, it can solve the performance issue by throwing more horsepower at it (CPUs, Memory, Parallel and the like).   But when the root cause of the performance issue is that the code it not doing the right thing to begin with, the autonomous database will not know that. 

 

The optimizer in the autonomous database is going to take a SQL statement as written, it will run it.  Yes, the optimizer has some limited capacity to do rewrites on the SQL, which can be rather good many times.   But will it know that the user actually doesn’t even need the data from these 6 tables in this complex subquery that is used four times slightly differently each time?  No, and that is where the human SQL optimization specialist will still be needed.  She/he can talk to the user(s) of the query and find out what is the real intent and what do they really want. 

 

How does code end up doing things it shouldn’t do?  Shouldn’t the code know what they are asking and write code to just get that?  Why would someone write code that does extra work for no good reason?  

 

No one writes code like that on purpose I believe.  What happens more often than not is they don’t write the code from scratch.  They take code that sort-of does what they want and then typically add things to it to get it to do what they want.  This will tend to bloat the code in such a way that it ends up doing much more than it should and gets so complex that it’s hard for anyone to read and fully understand, including the optimizer. 

 

There is a saying about tuning that goes like this “Tune the question not the query”.   And that is still going to be a human activity for the foreseeable future.   The optimizer doesn’t know the real question being asked, it knows the representation of it as presented by the SQL. 

Monday, July 20, 2020

Baking a Cake in the Cloud or not

If you want an efficient system, don’t bake a cake at home, let the store bake it for you.

 

I’m a SQL performance guy for the most part.  When I look at SQL a key thing is to understand the nature of the SQL and how it fits in to the application.  This is sort of philosophical, I’m trying to understand the why of the SQL.  And one attribute I look for is how is it being used to process the result set of rows.  

 

And a killer is the ever-popular row-by-row operation.  This is hardly efficient no matter how you look at it.  This is a philosophical issue at the core, how is the writer of the code is seeing the solution from a grand perspective.  

 

A good analogy is baking a cake, and getting the ingredients in particular.  The situation is this, you need to bake a cake and have none of the ingredients in your kitchen to bake this cake.  You have a list of the ingredients; flour, sugar, milk, eggs and so on.  How do you go about buying them?

 

Would you get in your car, drive to the store get the flour, pay for it, get back in your car, drive home put it on the counter, and check off flour from your list.  Then get back I the car, and do this cycle over again for sugar and each ingredient.  I sure hope not.  I suspect you would take the list and go to the store once, in the store get all ingredients in your cart (tram) and pay for the whole lot once and now go home and bake the cake.  

 

But even this isn’t likely best, why even bake the cake at home at all?  How about have the store do all the for you. Many grocery stores (at least in the USA) can bake cakes right in the store, and they are pretty good at it since they have staff and equipment to do it.  (And there are septicity shops where you can get even more elaborate baking creations, which you could do yourself, maybe on the fourth or fifth try.)

 

If you all you want is simple cake, you can just pick it up, pay for it and go.  Akin to a simple query in the database.  If you want to have a custom cake you call ahead and they have it ready for you when you arrive.  More like a complex query with lots of tables and maybe some fancy mathematical operations and such.  

 

Certainly, a lot of folks have gotten past the idea of getting “one row at a time” for their SQL.  But the second part of letting the database do most of the work is where many folks have trouble.  This is mainly due to their philosophy and training on coding, they are very good Java, Python, C++ or like programmers.  To them the database is just data, and they only want to get it from there into their application code.  There are queries bringing back millions of rows and then the application does more processing on them.  This is baking the cake at home, fun as baking a cake is, from a performance point of view it’s not ideal. 

 

Using a stored PL/SQL packaged procedure to get the work done is best.  Push the business logic into the database inside a PL/SQL block.  The database is really good at this and can likely process those millions of rows faster than your application layer can.  This is buying the already baked and decorated cake from the store.  

 

As I said at the top, if you want an efficient system, don’t bake a cake at home, let the store bake it for you.