Sunday, August 18, 2019

Exadata, Indexes and Performance in the cloud or not


This year I’ve been on a gig working with a good sized Exadata implementation.   I really don’t know how big it is, my job is to tune queries not the database.   Before this gig I’d only done a little bit of work on Exadata and wasn’t well aware of the system.  Yes I knew the technical aspects of how Exadata worked, the storage cells, offloading, and such.  But to work with it day after day is something much more.

Indexing is a hot topic around Exadata.  There are some who think there shouldn’t be any indexes at all on an Exadata system.  That is likely a bit extreme.  Fewer indexes for sure.  Primary Keys must have them and Foreign Keys really should.  Other indexes that would be created for performance might not be a good idea.

There are two reasons of this.  One is that there is a chance that offloading wouldn’t happen on an index scan.  This is the ability for the Exadata machine to run the predicate at the storage device.  This offloading is very powerful.  The storage cells in effect pre-screen the data at the storage level then only bring back data that matches.  With an index this might not happen.  But with each release of Exadata more and more types of scans are supported at the storage cell level so this keeps becoming less of an issue.

The other reason, which is really the bad one, is that indexes tend to drive NESTED LOOPS joins.  And they tend to be less then optimal many times in Exadata.   Not always of course, I have seen many plans with NESTED LOOPS joins and they are quite nice.  However, when you start joining millions or billions of rows in a NESTED LOOPS, well things tend to good badly. 

Getting the optimizer to not use an index can be done many ways.  Of course you can use the FULL hint, but many shops discourage the use of hints.  And I generally agree with this stand.  It’s not that hints are “bad” it’s really that folks lean on them too much.  Too often I will see over use of hints, and it’s clear many times the coder doesn’t really know what to do, so he/she just throw a bunch of hints at it hoping they will work. 

Other techniques mostly revolve around making it “impossible” for the index to be used.   Some examples, use TRIM or COALESCE or NLV on the indexed column in the join.  And another is making the table an in line view (ILV) which many times will need a NO_MERGE hint. 

Personally I like the COALESCE.  The great thing about that is that it works on unique key indexes.  NVL doesn’t work, because the optimizer recognizes that the field is already unique and removes the NVL form the predicate. 

So what to do?  Let the numbers lead you to the problem.  Get an execution plan with stats, SQL Monitor is my go-to tool for this, and look at where all the time and resource usage is going in the plan.  Focus on that part of the plan.  Sometimes removing it is the answer.  Just had a query that was running sub-query to check a value which was really resource intensive.  Turned out, that code was useless.  Removed it, job done.

If you do have to do it, how can you make this access faster?  Is the index a good idea?  Is the full scan a good idea?  Is it the right type of join of the data?  Are the predicates getting applied in the right place?  Can another predicate be applied to this table?  Are the predicates well defined?

It is way more work, but more often than just doing a hint here or there, rewriting the code is the best way to tune a query.