Here is a link to a presentation I did about Oracle Trace events 10046 and 10053, on December 21st. Enjoy!
Click here or copy paste the link below.
https://www.youtube.com/watch?v=5x8pnMaepxM&feature=youtu.be
And have some pie.
Click here or copy paste the link below.
https://www.youtube.com/watch?v=5x8pnMaepxM&feature=youtu.be
And have some pie.
Bottom line up front: Correlated Sub Queries (or sub-selects) are really bad for performance.
You’re writing a query, and in the result set you need the MAX(ORDER_DATE) based on the customer information in each row being returned.
Ah easy-peeze! I’ll write a sub-select in the select list to do something like this:
custord.customer_number,
(select max(order_date) from customer_orders custord2
where custord2.customer_id = custord.customer_number) LAST_ORDER_DATE,
…
Great! It works and life is good. No need for a nasty “group by” thing. Yea, I’m the man!
Or am I? How often is that sub-select going to run? Once for every row returned by the main query. That could be easily thousands or millions depending on what you are doing. This one is likely pretty harmless, especially if it’s the only one and it can do a unique look up on an index this one you might not even notice it’s a problem, as long as the data set is relatively small. But it wouldn’t scale well.
How about a 4- or 5-way table join for the sub-selects, or more? I recently had one that was a 12-way table join (Turned out that at least 4 for the tables weren’t needed when we look into it, but still an 8-way join is a lot to do in a sub-select like this.)
Below is part of a SQL monitor plan of a query that had 16 correlated sub-queries in the select list that I recently worked on for a client. Each one ran rather fast each time, but each one was running 9,794 times and the total run time was over 10 minutes. Which to bring back only 9,794 rows, that seemed like a lot. Notice that the number of rows returned and the times each one ran is the same. You can see this in the EXECITION column compared to the actual rows returned. Notice that the first one didn’t return a row for each input row. This was the nature of that sub-query, there was a pretty good chance it wouldn’t return a row each time.
This screen shot only shows a few of them. Most were a 3 or 4-way table join. I’ve hidden the full plan for each sub-query, the plan it used was fine. It was the excessive runs that was the issue.
A semi-side point, when looking at this output below notice the estimated rows is 1 (one) for each of the sub-queries. That estimate is correct. For each run it returned at most 1 row. The actual rows returned is higher because that is the total for all runs. The estimate is a “per run” estimate, and is correct. This is an excellent example of why actual run time stats is necessary for performance optimization. You might look at the explain plan and think that the one row is just fine. In reality that is very misleading.
So, what to do? I rewrote the query to have a CTE (Common Table Expression, using the WITH clause, also called a Sub-Query Factor) for each sub-query. I was hoping to be able to combine some to reduce the number of them but that wasn’t possible from what I could see of the queries and the data. Maybe some could be but it wasn’t obvious from the code.
Now I joined each CTE into the main query just like any other table. Most of the sub-queries were joined on 2 columns and I did join them all in as LEFT OUTER JOINS because the first one I knew for sure wouldn’t return a row for each row in the main query to be safe I did that with all of them. I was running out of time on this gig so I didn’t get to test with INNER JOINs, but I did let them know that they would likely be able to convert most of the joins to INNER.
This query was running in 33 seconds after this change. From over 10 minutes to 33 seconds, yea I call that a win. And with this change the time to run is going to very slowly go up. For the originally it will go up sharply as the number of rows returned by the main query go up. In short, the new code is scalable, the old code was not.
Key point I used the MATERILIZE hint on all the CTEs, each CTE was only referenced once in the query so I didn’t want it to be merged/unnested into the main query. I find that the MATERILZE hint should be used nearly all the time when creating CTEs. There have been times that I didn’t need it because the optimizer materialized it anyway (typically when you use it more then once). And a few times when merging in to the main was a good idea. Test both way when you are working with CTEs.
What has this got to do with a fire in my fireplace? Maybe it's you're "burning CPU LIOs and time" when you do these correlated sub-queries. Or maybe it's just that I like the picture.
I was working on a gig recently and they had the rather interesting situation where queries were running faster in production then in the test environment. This had started happening rather recently so the question was, why?
Upon investigation, the key thing was the stats were different in the two systems. I ran 10053 (optimizer trace) events on a sample query on both systems and it was clear that the differences in stats made the two plans different. The join orders were different and use of indexes, etc. The stats weren’t a lot different between the two system, but (generally) the tables in TEST were slightly bigger than in PROD.
This was because they were testing new stuff in test. It certainly makes sense to do this of course, but the issue was since the two systems were different, the plans could be (and were) different. And they were different enough that the test query I was provided ran in 6 seconds on PROD and 15+ minutes on TEST. That’s a pretty huge difference.
My first stab at changing the query was to rewrite with a couple of CTEs (Common Table Expressions use the WITH clause). I isolated two subqueries that were hierarchal with the good old START WITH/CONNECT BY PRIOR syntax. I noted that these were being evaluated in different parts of the plan between the two so I thought maybe isolating these as CTEs would help. And it did. The plan still was a little different between the systems but now both were running in about 6 seconds. Win!
But that wasn’t really the root problem. I kept digging around to see what else could be a cause. Since telling them to rewrite all their code with the “right” CTEs might be difficult at best. And this is where something in the 10053 trace came in real handy.
Near the top of the 10053-trace file is a section on the parameter used by the optimizer. This section alone can be a very insightful on what is going on. The sections start with this:
***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
And has 2 parts one starting with:
*************************************
PARAMETERS WITH ALTERED VALUES
******************************
The other with:
*************************************
PARAMETERS WITH DEFAULT VALUES
******************************
As the names suggest, they list parameters that have either changed values or are using defaults. The list of parameters in the ALTERED VALUES should be quite short, the fewer the better. They didn’t have a lot, which was very good. But they did have two that really should be defaults today on both PROD and TEST:
optimizer_index_cost_adj
optimizer_index_caching
These were common to change back in version 10 when the default cost model went from IO to CPU. In 9 the default cost model was IO and in 10 the default became CPU which is what is used today. Back then many shops had issues when first upgrading to 10 that indexes were not getting used appropriately. Changing these parameters to non-defaults helped.
But that time is over and the CPU model has been adjusted so these parameters should be the defaults. The defaults are 100 for the cost adjust parameter and 0 for caching. This means cost the plan as is, don’t increases the cost (a setting over 100) and don’t discount the cost (a setting under 100). And for the caching parameter it means don’t assume any of the index is cached (in the buffer cache).
I tested the sample query with following hints. NOTE: Using the OPT_PARAM hit is great for testing; it should not be used for production code:
/*+ OPT_PARAM('optimizer_index_cost_adj' '100') OPT_PARAM('optimizer_index_caching' '0') */
And running the original code with these hints on PROD and TEST the query ran in about 6 seconds on each. Better Win! The reason this is better is that now it appears that without changing/rewriting the code they can get the performance they expect. More testing will need to be done by the client to verify this, but the testing that was done while I was on the gig looked quite good.
But why did this work? I believe that the differences in the stats was just enough to get exaggerated by the non-default parameters to make for different plans that caused poor performance. The plans were still very slightly different in the two systems, but not enough to cause a performance issue.
The key to this story is, use the defaults Luke, (substitute your name in for Luke). Oracle does all of its internal testing with default settings for the parameters. Once you venture into the land of using non-defaults you are in the space of untested. Clearly to overcome a bug or a particular situation a non-default can be needed. Once that situation or bug has be fixed, get back to the defaults.
I tend to disagree. I believe the need for folk who can write optimal code will be just as needed as we embark on this autonomous path as there ever was, maybe more.
The reason for this is rather simple. Someone writing code has a purpose for the code, a result set they are interested in seeing. The better they know the data they are working with and the better then know what they are looking for, the better they can write something to get the results. This is why many times in my optimization of code I need to talk to the author about what is it they are trying to get in order to find a more optimal way to get the results they want.
Of course, I and the autonomous database can do things like throw indexes at it or use hints/tricks to get the code to do this or that. But maybe the code is just not the right code to begin with. An outer join that doesn’t need to be, a subquery that really isn’t returning anything useful for the final result set, and maybe it’s just returning data that isn’t even used.
When a query is only tens of lines long, some of these things can be seen with enough looking and testing sometimes. But when the query is hundreds of lines long and with an execution plan with 1000s lines, it’s difficult for someone who doesn’t know the “why” of the SQL to be able to really optimize it.
This is where the autonomous database will have some trouble optimizing code, if it can do anything at all. Sure, it can solve the performance issue by throwing more horsepower at it (CPUs, Memory, Parallel and the like). But when the root cause of the performance issue is that the code it not doing the right thing to begin with, the autonomous database will not know that.
The optimizer in the autonomous database is going to take a SQL statement as written, it will run it. Yes, the optimizer has some limited capacity to do rewrites on the SQL, which can be rather good many times. But will it know that the user actually doesn’t even need the data from these 6 tables in this complex subquery that is used four times slightly differently each time? No, and that is where the human SQL optimization specialist will still be needed. She/he can talk to the user(s) of the query and find out what is the real intent and what do they really want.
How does code end up doing things it shouldn’t do? Shouldn’t the code know what they are asking and write code to just get that? Why would someone write code that does extra work for no good reason?
No one writes code like that on purpose I believe. What happens more often than not is they don’t write the code from scratch. They take code that sort-of does what they want and then typically add things to it to get it to do what they want. This will tend to bloat the code in such a way that it ends up doing much more than it should and gets so complex that it’s hard for anyone to read and fully understand, including the optimizer.
There is a saying about tuning that goes like this “Tune the question not the query”. And that is still going to be a human activity for the foreseeable future. The optimizer doesn’t know the real question being asked, it knows the representation of it as presented by the SQL.
I’m a SQL performance guy for the most part. When I look at SQL a key thing is to understand the nature of the SQL and how it fits in to the application. This is sort of philosophical, I’m trying to understand the why of the SQL. And one attribute I look for is how is it being used to process the result set of rows.
And a killer is the ever-popular row-by-row operation. This is hardly efficient no matter how you look at it. This is a philosophical issue at the core, how is the writer of the code is seeing the solution from a grand perspective.
A good analogy is baking a cake, and getting the ingredients in particular. The situation is this, you need to bake a cake and have none of the ingredients in your kitchen to bake this cake. You have a list of the ingredients; flour, sugar, milk, eggs and so on. How do you go about buying them?
Would you get in your car, drive to the store get the flour, pay for it, get back in your car, drive home put it on the counter, and check off flour from your list. Then get back I the car, and do this cycle over again for sugar and each ingredient. I sure hope not. I suspect you would take the list and go to the store once, in the store get all ingredients in your cart (tram) and pay for the whole lot once and now go home and bake the cake.
But even this isn’t likely best, why even bake the cake at home at all? How about have the store do all the for you. Many grocery stores (at least in the USA) can bake cakes right in the store, and they are pretty good at it since they have staff and equipment to do it. (And there are septicity shops where you can get even more elaborate baking creations, which you could do yourself, maybe on the fourth or fifth try.)
If you all you want is simple cake, you can just pick it up, pay for it and go. Akin to a simple query in the database. If you want to have a custom cake you call ahead and they have it ready for you when you arrive. More like a complex query with lots of tables and maybe some fancy mathematical operations and such.
Certainly, a lot of folks have gotten past the idea of getting “one row at a time” for their SQL. But the second part of letting the database do most of the work is where many folks have trouble. This is mainly due to their philosophy and training on coding, they are very good Java, Python, C++ or like programmers. To them the database is just data, and they only want to get it from there into their application code. There are queries bringing back millions of rows and then the application does more processing on them. This is baking the cake at home, fun as baking a cake is, from a performance point of view it’s not ideal.
Using a stored PL/SQL packaged procedure to get the work done is best. Push the business logic into the database inside a PL/SQL block. The database is really good at this and can likely process those millions of rows faster than your application layer can. This is buying the already baked and decorated cake from the store.
As I said at the top, if you want an efficient system, don’t bake a cake at home, let the store bake it for you.