Thursday, April 2, 2020

Working from Home

If you are one of the many folks that are suddenly trying to do your job from home, I’d like to take a moment to pass on some things I’ve learned. I’ve been working from home for years. It’s a different mindset. A couple of things I’ve found that might help you deal with this change:
Stay focused. You can get a lot more done at home then in the office. There isn’t the office politics to deal with which give you more time to do your job. On the flip side the job can take over your life. Set yourself hours for work and stick to it. 
Use technology. There is Skype, Zoom, Teams and other programs that can let you connect coworkers and of course your phone. On a gig recently a coworker would typically call me in the morning and he and I would stay on the phone for an hour or more. Not always talking but it was like he was next to me at the office. We could talk about what we were doing and the typical chit-chat. 
Don’t watch TV while working. Some folks can do this, but really its not good. You’ll get less done and feel like you're rushed when you realize you didn’t get such-and-such done. Some music for background noise is likely better and less distracting. 
Eat meals like you would going to an office or job site. Have a breakfast before you start, minimize snacking and have a dedicated lunch time. This keeps you on schedule and like is better for your health then just snaking all the time. 
Get up and move around now and then. Doing a simple domestic task once in a while to help break up the day a bit is good. Or just walk about and check for the mail. Don’t get into a big project, like cleaning the bathroom or painting a room. You wouldn’t do that when at the office or work site so don’t do that when you’re working. 
If you are the manger of folks that are now working from home, you may have the hardest adjustment. This is complex and totally different way to manage folks. In a nutshell you have to trust that they are doing their job. Lording over them and calling them every minute and checking their status on Skype or whatever is a great way to drive them and you crazy. 
Set up scheduled status calls, keep them short. Give them assignments to do and realistic deadlines. Help them with the technology, as best you can get them things they need. For example, having more than one monitor can really boost performance for folks doing a lot of computer-based work. Be open to help them, be it you actually helping them get something done or helping them find the resources they need to get something done. 
Your management skills will really be tested in ways you’ve never imagined. I’ve managed remote teams and it’s a challenge. Hopefully your team will be in the same time zone, that helps a lot. 
Good luck!

Monday, March 30, 2020

Hints in the cloud or not

Hints aren’t hints; they are directives. 
 
Don’t think of a hint as some way of encouraging the optimizer to do something, it is a demand to do something.  Generally, hints are only “ignored” when you miss spell it or put in something that can’t be done.  For example, have the FULL hint and an INDEX hint on the same table.  There are also cases when optimizing will transforms a query in such a way that hints in effect disappear from the final query that the optimizer really uses.  This isn’t common in most shops but can happen.  


Hints are best used rarely and must be documented when used.

If you find yourself “always” using a hint, then you have a problem.  Two hints I’ve seen used “out of habit” are the NO_MERGE and MATERILZE hints.  Just because you have sub-query doesn’t mean you need the NO_MERGE.   It might not be getting merge anyway and maybe it’s a good idea if it is.  The same basic issues are true with MATERLIZE on CTEs (Common Table Expressions or Sub-Query Factors).  The MATERLIZE hint tells the optimizer to resolve it as a separate query, and turn the results into a temporary table like structure.  And often this is exactly what we want to have happen with a CTE.  But the same can be true of a CTE, it sometimes wouldn’t be merged into the main query and even if it is, sometimes this is a good thing. 

If your testing validates that the hint helps the optimizer do the right thing and you can’t get it any other way, document it’s use.  This documentation, doesn’t have to be elaborate and is really for you.  Always document the code as if you will be working on this code again in two years and have no recollection of working on this code before.  Help yourself understand what you did today.  

Hints can be a fast fix for a bug.

I know this may be a shock, but there are bugs in the optimizer.  This is where the use of a hint might be a better solution then some elaborate coding trick.  Cardinality can be an issue here, and the CARDINALITY hint can help.  And CTEs again can be a problem.  If the cardinality estimate is way off for a CTE and that CTE is used many times in the main query, then the odds of the main query having a good plan are slim.  

If you can’t rework the code of the CTE to get a better estimate then a CARINAILTY hint can really save the day.  BUT this is a hint that could go bad over time.  Always put in a comment as to when this was used and that it will need to be checked in the future.  There may come a time it wouldn’t be needed or that the value used needs to be updated.  Keep in mind that the number needs to be in the correct magnitude, not exactly right. 


Hint should be viewed as a temporary fix and not the final solution. 

All this leads up to this, using hints are temporary fixes to a problem.  Often, we are tuning around bugs or poor coding.  If you don’t have access to the author of the query then significate rewriting of the code can be difficult or impossible.  Using a few well-placed hints can get the code to at least run in an acceptable amount of time.   This can at least get thru an “emergency” and hopefully there will come a time when the code can be rewritten or the bug fix is applied and the hints are no longer needed.

Monday, November 4, 2019

Aliases in Oracle in the cloud or not


Recently I posted in Twitter (https://twitter.com/RicVanDyke) that you shouldn’t use single character aliases.  And I made the point that using full table names is best.  This of course sparked a good dialog that went on for about two days.  

After tuning hundreds of queries with thousands of lines in each of them, I really think that full table names are best.  Next best is shortening the name by removing non-leading vowels.   And even more important is make aliases unique within the statement.  This means if you use the same table more than once, put a 01, 02 and so on after each use.  

But you say “What if the table name is really long?” Are you really that lazy?  You can’t use copy/paste? 

But you say “My query is really short, using single characters just makes sense.”   No, this is about the habit.  You start doing in a small query, 5-6 lines long.  Then you say, well this one is only 20 lines its OK to use single characters.  Then these queries get pushed together, and before you know it you have a several hundred line query with all these “A” and “E” and “C” and what not in it. Which “A” goes with which table?  

And yes this applies to columns and subqueries as well.  Don’t have subquery “A” unioned with subquery “B”.  Give them a name.  Heck I’m OK with SUBQ01 and SUBQ02.  That is way better to search a huge query for SUBQ01 then looking for “A”.  Trust me, that is painful. 

I’m sticking to my guns on this.  Never use single character aliases.  Shorter names are fine, but keep it close to the name of the table.  Sure for EMPLOYEES using EMP is fine, but not E.  Because you’ll find yourself with several subqueries all using “E” and then you’re looking at the execution plan trying to figure out which sub query this predicate goes with.  And instead of it taking seconds, it’s hunting expedition to figure it out. I’ve had to do this way too many times. 

Of course all is good when it’s your code; you (presumably) knew what you were doing when you wrote it.  What about when someone else is reading your code?  

Or you are reading it 5 years later?  Do you really think you’ll remember what you were doing 5 years ago?  Heck I have trouble working on code from yesterday. 

Help others and yourself, don’t use single character aliases.  Make your code easier to read and debug by being more verbose then terse.  You’ll thank yourself later.

Wednesday, September 4, 2019

A case for OPTIMIZER_INDEX_COST_ADJ in the cloud or not


I’ve been doing a lot of query optimization on Exadata this year and it’s been quite an experience.  As I mentioned in my last post the use of indexes on Exadata is tricky.  For the most part NOT using an index is a good idea, but not always.  There are still times the use of an index is good on Exadata. 

 
I know a lot about the optimizer having taught classes on it and studying it quite a bit.  I don’t think I’m an expert on it by a long shot, but I do believe I have a good grasp of how it works.  The optimizer has been written over the many years with transactional based queries as the expected type of environment, which is traditionally the environment that the Oracle database has been used.

Exadata changes that a bit.  Folks can (and do) use Exadata for transactional systems. In these cases indexes can be the way to go.  However, when we are doing reporting across millions and/or billions of rows the index can be a hindrance.  Partially because of the index access itself and also because it drives NESTED LOOPS joins, as mentioned in my last post.  

This brings me to the parameter OPTIMIZER_INDEX_COST_ADJ.  This can also be used in a hint, which can look like this: OPT_PARAM('OPTIMIZER_INDEX_COST_ADJ',10000).  What this does is change the costing of an index.  The default is 100, which means leave the computed cost alone.  

The range is 1 to 10,000. When setting it below 100 makes the index cost cheaper and above 100 makes it more expensive.  At a 50 that would take the cost of an index down to 50% of what the computed value would be.  This means that if optimizer computed a cost of using the index at 20, it would now be a 10 for the cost.  On the high end of 10,000 this makes an index cost 100 times more than computed, so an index of a computed cost of 20 would now be 2,000.   

In an Exadata system this artificial over costing of indexes can be useful.  I must stress that this is not a silver bullet that will work all the time.  Some queries will benefit, others will not.  This is best used as a hint with proper testing.  With the proper use of this as a hint I’ve taken queries from hours to minutes.  For my use, using 10,000 as the value seems to be best.  You should test for your query what value is best.  Something above 5,000 is likely to work well if your goal is to get the query to use less indexing, and less NESTED LOOPS.

One of the reasons I really like using this as a hint when appropriate is that it doesn’t really force the optimizer's hand.  Yes it does make index look more or less attractive for an access path.  It doesn’t hind something from the optimizer, like using COALESCES on the join column effectually does.   When we put functions on columns it tends to mess up cardinality calculations and the plan stumbles around with unrealistic cardinalities and might end up with an acceptable plan but more by happenstance then on purpose.  

Also it’s easy to test if this is still a good idea.  One hint removed from the query an you can check to see if the index (or indexes) is now a good idea.  With function wrapped around the join columns its likely never going to be tested to see if those indexes are good or bad for the plan as time goes by. 

If you do use this hint, please put in some comments about it.  And something like “This hint should be tested from time to time to make sure it is still a good idea” is a good start to a comment about its use.

Sunday, August 18, 2019

Exadata, Indexes and Performance in the cloud or not


This year I’ve been on a gig working with a good sized Exadata implementation.   I really don’t know how big it is, my job is to tune queries not the database.   Before this gig I’d only done a little bit of work on Exadata and wasn’t well aware of the system.  Yes I knew the technical aspects of how Exadata worked, the storage cells, offloading, and such.  But to work with it day after day is something much more.

Indexing is a hot topic around Exadata.  There are some who think there shouldn’t be any indexes at all on an Exadata system.  That is likely a bit extreme.  Fewer indexes for sure.  Primary Keys must have them and Foreign Keys really should.  Other indexes that would be created for performance might not be a good idea.

There are two reasons of this.  One is that there is a chance that offloading wouldn’t happen on an index scan.  This is the ability for the Exadata machine to run the predicate at the storage device.  This offloading is very powerful.  The storage cells in effect pre-screen the data at the storage level then only bring back data that matches.  With an index this might not happen.  But with each release of Exadata more and more types of scans are supported at the storage cell level so this keeps becoming less of an issue.

The other reason, which is really the bad one, is that indexes tend to drive NESTED LOOPS joins.  And they tend to be less then optimal many times in Exadata.   Not always of course, I have seen many plans with NESTED LOOPS joins and they are quite nice.  However, when you start joining millions or billions of rows in a NESTED LOOPS, well things tend to good badly. 

Getting the optimizer to not use an index can be done many ways.  Of course you can use the FULL hint, but many shops discourage the use of hints.  And I generally agree with this stand.  It’s not that hints are “bad” it’s really that folks lean on them too much.  Too often I will see over use of hints, and it’s clear many times the coder doesn’t really know what to do, so he/she just throw a bunch of hints at it hoping they will work. 

Other techniques mostly revolve around making it “impossible” for the index to be used.   Some examples, use TRIM or COALESCE or NLV on the indexed column in the join.  And another is making the table an in line view (ILV) which many times will need a NO_MERGE hint. 

Personally I like the COALESCE.  The great thing about that is that it works on unique key indexes.  NVL doesn’t work, because the optimizer recognizes that the field is already unique and removes the NVL form the predicate. 

So what to do?  Let the numbers lead you to the problem.  Get an execution plan with stats, SQL Monitor is my go-to tool for this, and look at where all the time and resource usage is going in the plan.  Focus on that part of the plan.  Sometimes removing it is the answer.  Just had a query that was running sub-query to check a value which was really resource intensive.  Turned out, that code was useless.  Removed it, job done.

If you do have to do it, how can you make this access faster?  Is the index a good idea?  Is the full scan a good idea?  Is it the right type of join of the data?  Are the predicates getting applied in the right place?  Can another predicate be applied to this table?  Are the predicates well defined?

It is way more work, but more often than just doing a hint here or there, rewriting the code is the best way to tune a query. 

Sunday, April 7, 2019

Don't do it at all in the cloud or not




It’s rather incredible what a simple thing like DISTINCT can do the performance.  Recently I was tasked with improving performance on a query.  It was taking over an hour to complete.  Running the plan and watching SQL Monitor it was easy to see that the plan was spending the vast major of time on a SORT UNIQUE step.   It was sorting thru 1.4 Billion rows, so that is a lot of data to go thru.  But the worse part was there was the same number of rows going into the SORT UNIQUE as coming out.  Meaning, all the rows going in were already unique. 

Looking at the columns in the select list, one column (an ID type field) was part of a primary key for a table was in the select list.  The other column of the primary key was a numeric representation of a date, this second column of the primary key was not in the select list.  The actual date was stored as a separate column.  (Why they do this I have no idea, but there it is.)  And this date column was in the select list along with the ID column.  Because of this the ID and Date combination was unique.  The optimizer didn’t and couldn’t know this because those two columns were not the primary key definition.  Hence it had to do the UNIQUE step to find the unique values, which in this case was all entire set of rows.  Why it didn’t do a HASH UNIQUE I couldn’t figure out, that likely would have been faster than the SORT UNIQUE.   

But really the fastest way to do this was to not do it at all.  It was completely unnecessary. Removing the unneeded DISTINCT chopped off an hour of processing time.

Once again proving the fastest way to do anything is to not do it at all.

What I hope you get out of this is, know your data.  You will always know something the optimizer doesn’t know.  Use that knowledge to write a better query and help the optimizer make better choices.