Monday, July 27, 2020

The Rise of SQL Optimization in the Autonomous Age in the cloud or not

In the world of IT, the age of Autonomous computing in upon us.  I’m an Oracle guy by trade and Oracle is very much all about Autonomous.   Machine Learning is the latest buzz in the profession and that has a lot of folks thinking that tuning and optimization will be completely in the hands of the autonomous database.

 

I tend to disagree.  I believe the need for folk who can write optimal code will be just as needed as we embark on this autonomous path as there ever was, maybe more.  

 

The reason for this is rather simple.  Someone writing code has a purpose for the code, a result set they are interested in seeing.  The better they know the data they are working with and the better then know what they are looking for, the better they can write something to get the results.  This is why many times in my optimization of code I need to talk to the author about what is it they are trying to get in order to find a more optimal way to get the results they want.

 

Of course, I and the autonomous database can do things like throw indexes at it or use hints/tricks to get the code to do this or that.  But maybe the code is just not the right code to begin with.  An outer join that doesn’t need to be, a subquery that really isn’t returning anything useful for the final result set, and maybe it’s just returning data that isn’t even used. 

 

When a query is only tens of lines long, some of these things can be seen with enough looking and testing sometimes.  But when the query is hundreds of lines long and with an execution plan with 1000s lines, it’s difficult for someone who doesn’t know the “why” of the SQL to be able to really optimize it. 

 

This is where the autonomous database will have some trouble optimizing code, if it can do anything at all. Sure, it can solve the performance issue by throwing more horsepower at it (CPUs, Memory, Parallel and the like).   But when the root cause of the performance issue is that the code it not doing the right thing to begin with, the autonomous database will not know that. 

 

The optimizer in the autonomous database is going to take a SQL statement as written, it will run it.  Yes, the optimizer has some limited capacity to do rewrites on the SQL, which can be rather good many times.   But will it know that the user actually doesn’t even need the data from these 6 tables in this complex subquery that is used four times slightly differently each time?  No, and that is where the human SQL optimization specialist will still be needed.  She/he can talk to the user(s) of the query and find out what is the real intent and what do they really want. 

 

How does code end up doing things it shouldn’t do?  Shouldn’t the code know what they are asking and write code to just get that?  Why would someone write code that does extra work for no good reason?  

 

No one writes code like that on purpose I believe.  What happens more often than not is they don’t write the code from scratch.  They take code that sort-of does what they want and then typically add things to it to get it to do what they want.  This will tend to bloat the code in such a way that it ends up doing much more than it should and gets so complex that it’s hard for anyone to read and fully understand, including the optimizer. 

 

There is a saying about tuning that goes like this “Tune the question not the query”.   And that is still going to be a human activity for the foreseeable future.   The optimizer doesn’t know the real question being asked, it knows the representation of it as presented by the SQL. 

Monday, July 20, 2020

Baking a Cake in the Cloud or not

If you want an efficient system, don’t bake a cake at home, let the store bake it for you.

 

I’m a SQL performance guy for the most part.  When I look at SQL a key thing is to understand the nature of the SQL and how it fits in to the application.  This is sort of philosophical, I’m trying to understand the why of the SQL.  And one attribute I look for is how is it being used to process the result set of rows.  

 

And a killer is the ever-popular row-by-row operation.  This is hardly efficient no matter how you look at it.  This is a philosophical issue at the core, how is the writer of the code is seeing the solution from a grand perspective.  

 

A good analogy is baking a cake, and getting the ingredients in particular.  The situation is this, you need to bake a cake and have none of the ingredients in your kitchen to bake this cake.  You have a list of the ingredients; flour, sugar, milk, eggs and so on.  How do you go about buying them?

 

Would you get in your car, drive to the store get the flour, pay for it, get back in your car, drive home put it on the counter, and check off flour from your list.  Then get back I the car, and do this cycle over again for sugar and each ingredient.  I sure hope not.  I suspect you would take the list and go to the store once, in the store get all ingredients in your cart (tram) and pay for the whole lot once and now go home and bake the cake.  

 

But even this isn’t likely best, why even bake the cake at home at all?  How about have the store do all the for you. Many grocery stores (at least in the USA) can bake cakes right in the store, and they are pretty good at it since they have staff and equipment to do it.  (And there are septicity shops where you can get even more elaborate baking creations, which you could do yourself, maybe on the fourth or fifth try.)

 

If you all you want is simple cake, you can just pick it up, pay for it and go.  Akin to a simple query in the database.  If you want to have a custom cake you call ahead and they have it ready for you when you arrive.  More like a complex query with lots of tables and maybe some fancy mathematical operations and such.  

 

Certainly, a lot of folks have gotten past the idea of getting “one row at a time” for their SQL.  But the second part of letting the database do most of the work is where many folks have trouble.  This is mainly due to their philosophy and training on coding, they are very good Java, Python, C++ or like programmers.  To them the database is just data, and they only want to get it from there into their application code.  There are queries bringing back millions of rows and then the application does more processing on them.  This is baking the cake at home, fun as baking a cake is, from a performance point of view it’s not ideal. 

 

Using a stored PL/SQL packaged procedure to get the work done is best.  Push the business logic into the database inside a PL/SQL block.  The database is really good at this and can likely process those millions of rows faster than your application layer can.  This is buying the already baked and decorated cake from the store.  

 

As I said at the top, if you want an efficient system, don’t bake a cake at home, let the store bake it for you.