I’ve been doing a lot of query optimization on Exadata this
year and it’s been quite an experience.
As I mentioned in my last
post the use of indexes on Exadata is tricky. For the most part NOT using an index is a
good idea, but not always. There are
still times the use of an index is good on Exadata.
I know a lot about the optimizer having taught classes on it
and studying it quite a bit. I don’t
think I’m an expert on it by a long shot, but I do believe I have a good grasp
of how it works. The optimizer has been
written over the many years with transactional based queries as the expected
type of environment, which is traditionally the environment that the Oracle
database has been used.
Exadata changes that a bit.
Folks can (and do) use Exadata for transactional systems. In these cases
indexes can be the way to go. However,
when we are doing reporting across millions and/or billions of rows the index
can be a hindrance. Partially because of
the index access itself and also because it drives NESTED LOOPS joins,
as mentioned in my last
post.
This brings me to the parameter OPTIMIZER_INDEX_COST_ADJ. This can also be used in a hint, which can
look like this: OPT_PARAM('OPTIMIZER_INDEX_COST_ADJ',10000). What this does is change the costing of an
index. The default is 100, which means
leave the computed cost alone.
The range is 1 to 10,000. When setting it below 100 makes
the index cost cheaper and above 100 makes it more expensive. At a 50 that would take the cost of an index
down to 50% of what the computed value would be. This means that if optimizer computed a cost
of using the index at 20, it would now be a 10 for the cost. On the high end of 10,000 this makes an index
cost 100 times more than computed, so an index of a computed cost of 20 would
now be 2,000.
In an Exadata system this artificial over costing of indexes
can be useful. I must stress that this
is not a silver bullet that will work all the time. Some queries will benefit, others will not. This is best used as a hint with proper
testing. With the proper use of this as
a hint I’ve taken queries from hours to minutes. For my use, using 10,000 as the value seems to
be best. You should test for your query
what value is best. Something above
5,000 is likely to work well if your goal is to get the query to use less
indexing, and less NESTED LOOPS.
One of the reasons I really like using this as a hint when appropriate
is that it doesn’t really force the optimizer's hand. Yes it does make index look more or less attractive
for an access path. It doesn’t hind
something from the optimizer, like using COALESCES on the join column effectually
does. When we put functions on columns it tends to
mess up cardinality calculations and the plan stumbles around with unrealistic
cardinalities and might end up with an acceptable plan but more by happenstance
then on purpose.
Also it’s easy to test if this is still a good idea. One hint removed from the query an you can
check to see if the index (or indexes) is now a good idea. With function wrapped around the join columns
its likely never going to be tested to see if those indexes are good or bad for
the plan as time goes by.
If you do use this hint, please put in some comments about
it. And something like “This hint should
be tested from time to time to make sure it is still a good idea” is a good
start to a comment about its use.
No comments:
Post a Comment