Sunday, August 18, 2019

Exadata, Indexes and Performance in the cloud or not

This year I’ve been on a gig working with a good sized Exadata implementation.   I really don’t know how big it is, my job is to tune queries not the database.   Before this gig I’d only done a little bit of work on Exadata and wasn’t well aware of the system.  Yes I knew the technical aspects of how Exadata worked, the storage cells, offloading, and such.  But to work with it day after day is something much more.

Indexing is a hot topic around Exadata.  There are some who think there shouldn’t be any indexes at all on an Exadata system.  That is likely a bit extreme.  Fewer indexes for sure.  Primary Keys must have them and Foreign Keys really should.  Other indexes that would be created for performance might not be a good idea.

There are two reasons of this.  One is that there is a chance that offloading wouldn’t happen on an index scan.  This is the ability for the Exadata machine to run the predicate at the storage device.  This offloading is very powerful.  The storage cells in effect pre-screen the data at the storage level then only bring back data that matches.  With an index this might not happen.  But with each release of Exadata more and more types of scans are supported at the storage cell level so this keeps becoming less of an issue.

The other reason, which is really the bad one, is that indexes tend to drive NESTED LOOPS joins.  And they tend to be less then optimal many times in Exadata.   Not always of course, I have seen many plans with NESTED LOOPS joins and they are quite nice.  However, when you start joining millions or billions of rows in a NESTED LOOPS, well things tend to good badly. 

Getting the optimizer to not use an index can be done many ways.  Of course you can use the FULL hint, but many shops discourage the use of hints.  And I generally agree with this stand.  It’s not that hints are “bad” it’s really that folks lean on them too much.  Too often I will see over use of hints, and it’s clear many times the coder doesn’t really know what to do, so he/she just throw a bunch of hints at it hoping they will work. 

Other techniques mostly revolve around making it “impossible” for the index to be used.   Some examples, use TRIM or COALESCE or NLV on the indexed column in the join.  And another is making the table an in line view (ILV) which many times will need a NO_MERGE hint. 

Personally I like the COALESCE.  The great thing about that is that it works on unique key indexes.  NVL doesn’t work, because the optimizer recognizes that the field is already unique and removes the NVL form the predicate. 

So what to do?  Let the numbers lead you to the problem.  Get an execution plan with stats, SQL Monitor is my go-to tool for this, and look at where all the time and resource usage is going in the plan.  Focus on that part of the plan.  Sometimes removing it is the answer.  Just had a query that was running sub-query to check a value which was really resource intensive.  Turned out, that code was useless.  Removed it, job done.

If you do have to do it, how can you make this access faster?  Is the index a good idea?  Is the full scan a good idea?  Is it the right type of join of the data?  Are the predicates getting applied in the right place?  Can another predicate be applied to this table?  Are the predicates well defined?

It is way more work, but more often than just doing a hint here or there, rewriting the code is the best way to tune a query. 

Sunday, April 7, 2019

Don't do it at all in the cloud or not

It’s rather incredible what a simple thing like DISTINCT can do the performance.  Recently I was tasked with improving performance on a query.  It was taking over an hour to complete.  Running the plan and watching SQL Monitor it was easy to see that the plan was spending the vast major of time on a SORT UNIQUE step.   It was sorting thru 1.4 Billion rows, so that is a lot of data to go thru.  But the worse part was there was the same number of rows going into the SORT UNIQUE as coming out.  Meaning, all the rows going in were already unique. 

Looking at the columns in the select list, one column (an ID type field) was part of a primary key for a table was in the select list.  The other column of the primary key was a numeric representation of a date, this second column of the primary key was not in the select list.  The actual date was stored as a separate column.  (Why they do this I have no idea, but there it is.)  And this date column was in the select list along with the ID column.  Because of this the ID and Date combination was unique.  The optimizer didn’t and couldn’t know this because those two columns were not the primary key definition.  Hence it had to do the UNIQUE step to find the unique values, which in this case was all entire set of rows.  Why it didn’t do a HASH UNIQUE I couldn’t figure out, that likely would have been faster than the SORT UNIQUE.   

But really the fastest way to do this was to not do it at all.  It was completely unnecessary. Removing the unneeded DISTINCT chopped off an hour of processing time.

Once again proving the fastest way to do anything is to not do it at all.

What I hope you get out of this is, know your data.  You will always know something the optimizer doesn’t know.  Use that knowledge to write a better query and help the optimizer make better choices.

Sunday, February 3, 2019

Changes in the cloud and on the ground

At the start of this year I embarked on a new adventure.  I’m now at Zione Solutions in a consulting role as a Senior Database Administrator and doing what I’ve been teaching for the past so many years.  It’s been quite a change, for one I am now using that strange new join syntax “LEFT OUTER JOIN … ON…” which I had successfully avoided for all these years.  It’s turned out to be harder than I expected to learn.  Also the queries I’ve been tuning are rather huge.  An average query is about 3-400 lines long, and the plans for these will easily be 1000 lines.   These are reporting queries the classic Data Warehouse type queries.  The smaller tables are millions of rows and 10’s of billions in a table is not uncommon.  Also it’s on Exadata.

I’ve seen firsthand that what I’ve been teaching works.  

One of the most important things you can do when tuning a query is; let the numbers lead you to the problem.   Some folks can look at a query and, without running it, make a good guesses as to where the problem will likely be.  And that can work, however why bother?  Run the query and use a tool like SQL Monitor to see how the plan is executing.  In the SQL Monitor report the activity column is a great way to see which step is taking the most percentage of time.   Now you can see exactly where a query is spending all it’s time and then pound that nail down. 

Also it’s about starting and staying small with the rows.  Here is where getting predicates applied at the right time and use of things like the WITH clause can really help.  With Exadata and working with huge volumes of data this concept is still critical to success; however the use of indexes is not a solution more often than not.

Why?  Not because indexes are bad, it’s just that index use tends to serialize the access.  This tends to cause nested loop joins which end up being very expensive many times because the plan will hit the inner table (or index) way too much.  This can be millions of times, even if using a unique index to look up values, doing it over a million times isn’t efficient.  A fast full index scan can work fine, since it works like a full table scan and can be in parallel.

Using subquery factors (also called CTE, Common Table Expression) are very powerful in these queries.  These are those temporary views you define in the WITH clause, typically it’s best to use the MATERIALIZE hint.   This hint forces the optimizer into making the sub query factor a global temp table like structor, without thie hint the optimizer might just merge it back into the query.  This is of course exactly what you don’t want to happen most times. Taking a subquery that is executed even just a couple times and making a sub query factor can be huge.  Also sometimes taking that subquery out of the main query and making it a “stand alone” query can help parallelization.   This can also work well to get that “start small stay small” thing. 

In the team that I’m working in now, it’s speculated that about 60% of the performance problems are solved with using sub query factors (CTEs).

Oh and did I mention LAT (Lateral) views?  Yea these tend to work very poorly in these kinds of queries.    Seeing one or more in these plans is a known killer to performance.  Making them go away can be tricky sometimes.  A typical cause of these is having an OR in the ON clause of an outer join.  These LAT view might work well with smaller data sets, but with large ones these can cause a query to stall at that LAT view for very long periods of time, like hours.  

For me it’s been quite a beginning to the year.  As the saying goes chance is inevitable, expect from vending machines.  And this year has had plenty, but still not from vending machines.