Monday, July 27, 2020

The Rise of SQL Optimization in the Autonomous Age in the cloud or not

In the world of IT, the age of Autonomous computing in upon us.  I’m an Oracle guy by trade and Oracle is very much all about Autonomous.   Machine Learning is the latest buzz in the profession and that has a lot of folks thinking that tuning and optimization will be completely in the hands of the autonomous database.

 

I tend to disagree.  I believe the need for folk who can write optimal code will be just as needed as we embark on this autonomous path as there ever was, maybe more.  

 

The reason for this is rather simple.  Someone writing code has a purpose for the code, a result set they are interested in seeing.  The better they know the data they are working with and the better then know what they are looking for, the better they can write something to get the results.  This is why many times in my optimization of code I need to talk to the author about what is it they are trying to get in order to find a more optimal way to get the results they want.

 

Of course, I and the autonomous database can do things like throw indexes at it or use hints/tricks to get the code to do this or that.  But maybe the code is just not the right code to begin with.  An outer join that doesn’t need to be, a subquery that really isn’t returning anything useful for the final result set, and maybe it’s just returning data that isn’t even used. 

 

When a query is only tens of lines long, some of these things can be seen with enough looking and testing sometimes.  But when the query is hundreds of lines long and with an execution plan with 1000s lines, it’s difficult for someone who doesn’t know the “why” of the SQL to be able to really optimize it. 

 

This is where the autonomous database will have some trouble optimizing code, if it can do anything at all. Sure, it can solve the performance issue by throwing more horsepower at it (CPUs, Memory, Parallel and the like).   But when the root cause of the performance issue is that the code it not doing the right thing to begin with, the autonomous database will not know that. 

 

The optimizer in the autonomous database is going to take a SQL statement as written, it will run it.  Yes, the optimizer has some limited capacity to do rewrites on the SQL, which can be rather good many times.   But will it know that the user actually doesn’t even need the data from these 6 tables in this complex subquery that is used four times slightly differently each time?  No, and that is where the human SQL optimization specialist will still be needed.  She/he can talk to the user(s) of the query and find out what is the real intent and what do they really want. 

 

How does code end up doing things it shouldn’t do?  Shouldn’t the code know what they are asking and write code to just get that?  Why would someone write code that does extra work for no good reason?  

 

No one writes code like that on purpose I believe.  What happens more often than not is they don’t write the code from scratch.  They take code that sort-of does what they want and then typically add things to it to get it to do what they want.  This will tend to bloat the code in such a way that it ends up doing much more than it should and gets so complex that it’s hard for anyone to read and fully understand, including the optimizer. 

 

There is a saying about tuning that goes like this “Tune the question not the query”.   And that is still going to be a human activity for the foreseeable future.   The optimizer doesn’t know the real question being asked, it knows the representation of it as presented by the SQL. 

No comments:

Post a Comment