Wednesday, September 4, 2019

A case for OPTIMIZER_INDEX_COST_ADJ in the cloud or not

I’ve been doing a lot of query optimization on Exadata this year and it’s been quite an experience.  As I mentioned in my last post the use of indexes on Exadata is tricky.  For the most part NOT using an index is a good idea, but not always.  There are still times the use of an index is good on Exadata. 

I know a lot about the optimizer having taught classes on it and studying it quite a bit.  I don’t think I’m an expert on it by a long shot, but I do believe I have a good grasp of how it works.  The optimizer has been written over the many years with transactional based queries as the expected type of environment, which is traditionally the environment that the Oracle database has been used.

Exadata changes that a bit.  Folks can (and do) use Exadata for transactional systems. In these cases indexes can be the way to go.  However, when we are doing reporting across millions and/or billions of rows the index can be a hindrance.  Partially because of the index access itself and also because it drives NESTED LOOPS joins, as mentioned in my last post.  

This brings me to the parameter OPTIMIZER_INDEX_COST_ADJ.  This can also be used in a hint, which can look like this: OPT_PARAM('OPTIMIZER_INDEX_COST_ADJ',10000).  What this does is change the costing of an index.  The default is 100, which means leave the computed cost alone.  

The range is 1 to 10,000. When setting it below 100 makes the index cost cheaper and above 100 makes it more expensive.  At a 50 that would take the cost of an index down to 50% of what the computed value would be.  This means that if optimizer computed a cost of using the index at 20, it would now be a 10 for the cost.  On the high end of 10,000 this makes an index cost 100 times more than computed, so an index of a computed cost of 20 would now be 2,000.   

In an Exadata system this artificial over costing of indexes can be useful.  I must stress that this is not a silver bullet that will work all the time.  Some queries will benefit, others will not.  This is best used as a hint with proper testing.  With the proper use of this as a hint I’ve taken queries from hours to minutes.  For my use, using 10,000 as the value seems to be best.  You should test for your query what value is best.  Something above 5,000 is likely to work well if your goal is to get the query to use less indexing, and less NESTED LOOPS.

One of the reasons I really like using this as a hint when appropriate is that it doesn’t really force the optimizer's hand.  Yes it does make index look more or less attractive for an access path.  It doesn’t hind something from the optimizer, like using COALESCES on the join column effectually does.   When we put functions on columns it tends to mess up cardinality calculations and the plan stumbles around with unrealistic cardinalities and might end up with an acceptable plan but more by happenstance then on purpose.  

Also it’s easy to test if this is still a good idea.  One hint removed from the query an you can check to see if the index (or indexes) is now a good idea.  With function wrapped around the join columns its likely never going to be tested to see if those indexes are good or bad for the plan as time goes by. 

If you do use this hint, please put in some comments about it.  And something like “This hint should be tested from time to time to make sure it is still a good idea” is a good start to a comment about its use.