Sunday, April 7, 2019

Don't do it at all in the cloud or not




It’s rather incredible what a simple thing like DISTINCT can do the performance.  Recently I was tasked with improving performance on a query.  It was taking over an hour to complete.  Running the plan and watching SQL Monitor it was easy to see that the plan was spending the vast major of time on a SORT UNIQUE step.   It was sorting thru 1.4 Billion rows, so that is a lot of data to go thru.  But the worse part was there was the same number of rows going into the SORT UNIQUE as coming out.  Meaning, all the rows going in were already unique. 

Looking at the columns in the select list, one column (an ID type field) was part of a primary key for a table was in the select list.  The other column of the primary key was a numeric representation of a date, this second column of the primary key was not in the select list.  The actual date was stored as a separate column.  (Why they do this I have no idea, but there it is.)  And this date column was in the select list along with the ID column.  Because of this the ID and Date combination was unique.  The optimizer didn’t and couldn’t know this because those two columns were not the primary key definition.  Hence it had to do the UNIQUE step to find the unique values, which in this case was all entire set of rows.  Why it didn’t do a HASH UNIQUE I couldn’t figure out, that likely would have been faster than the SORT UNIQUE.   

But really the fastest way to do this was to not do it at all.  It was completely unnecessary. Removing the unneeded DISTINCT chopped off an hour of processing time.

Once again proving the fastest way to do anything is to not do it at all.

What I hope you get out of this is, know your data.  You will always know something the optimizer doesn’t know.  Use that knowledge to write a better query and help the optimizer make better choices.

Sunday, February 3, 2019

Changes in the cloud and on the ground

At the start of this year I embarked on a new adventure.  I’m now at Zione Solutions in a consulting role as a Senior Database Administrator and doing what I’ve been teaching for the past so many years.  It’s been quite a change, for one I am now using that strange new join syntax “LEFT OUTER JOIN … ON…” which I had successfully avoided for all these years.  It’s turned out to be harder than I expected to learn.  Also the queries I’ve been tuning are rather huge.  An average query is about 3-400 lines long, and the plans for these will easily be 1000 lines.   These are reporting queries the classic Data Warehouse type queries.  The smaller tables are millions of rows and 10’s of billions in a table is not uncommon.  Also it’s on Exadata.

I’ve seen firsthand that what I’ve been teaching works.  

One of the most important things you can do when tuning a query is; let the numbers lead you to the problem.   Some folks can look at a query and, without running it, make a good guesses as to where the problem will likely be.  And that can work, however why bother?  Run the query and use a tool like SQL Monitor to see how the plan is executing.  In the SQL Monitor report the activity column is a great way to see which step is taking the most percentage of time.   Now you can see exactly where a query is spending all it’s time and then pound that nail down. 

Also it’s about starting and staying small with the rows.  Here is where getting predicates applied at the right time and use of things like the WITH clause can really help.  With Exadata and working with huge volumes of data this concept is still critical to success; however the use of indexes is not a solution more often than not.

Why?  Not because indexes are bad, it’s just that index use tends to serialize the access.  This tends to cause nested loop joins which end up being very expensive many times because the plan will hit the inner table (or index) way too much.  This can be millions of times, even if using a unique index to look up values, doing it over a million times isn’t efficient.  A fast full index scan can work fine, since it works like a full table scan and can be in parallel.

Using subquery factors (also called CTE, Common Table Expression) are very powerful in these queries.  These are those temporary views you define in the WITH clause, typically it’s best to use the MATERIALIZE hint.   This hint forces the optimizer into making the sub query factor a global temp table like structor, without thie hint the optimizer might just merge it back into the query.  This is of course exactly what you don’t want to happen most times. Taking a subquery that is executed even just a couple times and making a sub query factor can be huge.  Also sometimes taking that subquery out of the main query and making it a “stand alone” query can help parallelization.   This can also work well to get that “start small stay small” thing. 

In the team that I’m working in now, it’s speculated that about 60% of the performance problems are solved with using sub query factors (CTEs).

Oh and did I mention LAT (Lateral) views?  Yea these tend to work very poorly in these kinds of queries.    Seeing one or more in these plans is a known killer to performance.  Making them go away can be tricky sometimes.  A typical cause of these is having an OR in the ON clause of an outer join.  These LAT view might work well with smaller data sets, but with large ones these can cause a query to stall at that LAT view for very long periods of time, like hours.  

For me it’s been quite a beginning to the year.  As the saying goes chance is inevitable, expect from vending machines.  And this year has had plenty, but still not from vending machines.    

Monday, December 10, 2018

WITH and the MATERIALIZE hint in the cloud or not


Recently in class a student mentioned that she’d be told that for the WITH clause you don’t need to use the MATERIALIZE hint anymore.  Maybe.   It turns out that if you REALLY want the subquery factor to be materialized, you do need to use it. 

Keep in mind that the optimizer sees your SQL statement as just that, a statement.  It can and will rearrange and rewrite your SQL into something it likes better.  And this means it can merge your subquery factors defined in the WITH clause into the statement.

I’ll use a simple statement to illustrate this point.   Clearly this select could (and really should) be written without a WITH clause at all.  This is just a simple example to illustrate the point.  Once you have a better understanding of how it works you can decide what is best for your particular situation.

Edit: Tests were done on an Oracle Database 12c Enterprise Edition Release 12.2.0.1.0, on a windows laptop.

SQL> -- no hints
SQL> WITH inline_view AS
  2  (
  3  SELECT /*+ qb_name(cust_totals)*/ CUST_NO, SUM(TOTAL_ORDER_PRICE) AS total_orders
  4    FROM ord2
  5   GROUP BY CUST_NO
  6  )
  7  SELECT /*+ qb_name(main) */ *
  8    FROM inline_view where total_orders > 1400000;

        CUST_NO    TOTAL_ORDERS
--------------- ---------------
           9379      1409495.95

If we run this the optimizer will merge the subquery factor and the main query together into one statement.  Which really makes sense here, the subquery factor isn’t really doing anything special. Here is the plan, notice the query block MAIN disappears completely and the only reference to the CUST_TOTALS block is the table alias:

SQL_ID  bnzfdp1yvk7vc, child number 0
-------------------------------------
WITH inline_view AS ( SELECT /*+ qb_name(cust_totals)*/ CUST_NO,
SUM(TOTAL_ORDER_PRICE) AS total_orders   FROM ord2  GROUP BY CUST_NO )
SELECT /*+ qb_name(main) */ *   FROM inline_view where total_orders >
1400000

Plan hash value: 2809195938

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |    69 (100)|          |
|*  1 |  FILTER             |      |       |       |            |          |
|   2 |   HASH GROUP BY     |      |    50 |   550 |    69   (2)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| ORD2 | 12890 |   138K|    68   (0)| 00:00:01 |
----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$8092F496
   3 - SEL$8092F496 / ORD2@CUST_TOTALS

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(SUM("TOTAL_ORDER_PRICE")>1400000)

Now to use the MATERIALIZE hint and the subquery factor defined in the WITH clause stays as a separate entity. 

SQL> -- materialize hint
SQL> WITH inline_view AS
  2  (
  3  SELECT /*+ materialize qb_name(cust_totals)*/ CUST_NO, SUM(TOTAL_ORDER_PRICE) AS total_orders
  4    FROM ord2
  5   GROUP BY CUST_NO
  6  )
  7  SELECT /*+ qb_name(main) */ *
  8    FROM inline_view where total_orders > 1400000;

        CUST_NO    TOTAL_ORDERS
--------------- ---------------
           9379      1409495.95

In the plan this time we can see the both query blocks and a new one which is the materialized data from the subquery factor. 

SQL_ID  65bmr36y814d3, child number 0
-------------------------------------
WITH inline_view AS ( SELECT /*+ materialize qb_name(cust_totals)*/
CUST_NO, SUM(TOTAL_ORDER_PRICE) AS total_orders   FROM ord2  GROUP BY
CUST_NO ) SELECT /*+ qb_name(main) */ *   FROM inline_view where
total_orders > 1400000

Plan hash value: 1306428065

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |       |       |    72 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6653_4B0C1F7 |       |       |            |          |
|   3 |    HASH GROUP BY                         |                            |  1000 | 11000 |    69   (2)| 00:00:01 |
|   4 |     TABLE ACCESS FULL                    | ORD2                       | 12890 |   138K|    68   (0)| 00:00:01 |
|*  5 |   VIEW                                   |                            |  1000 | 26000 |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS FULL                     | SYS_TEMP_0FD9D6653_4B0C1F7 |  1000 | 11000 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - MAIN
   2 - CUST_TOTALS
   4 - CUST_TOTALS  / ORD2@CUST_TOTALS
   5 - SEL$03D1C9D1 / INLINE_VIEW@MAIN
   6 - SEL$03D1C9D1 / T1@SEL$03D1C9D1

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("TOTAL_ORDERS">1400000)

This makes it clear that if the optimizer can merge the subquery factor in to the main query it will.  The materialize hint will keep the subquery factor as a separate entity.   Bottom line is that if you want a subquery factor kept as a separate entity, and you notice that in the plan it’s being merged in, then use the materialize hint.

Which brings up another point.  Some folks will use the NO_MERGE hint to keep the subquery factor from being merged in.  This works, sort of.  Here is the same query using the NO_MERGE hint.

SQL> -- no merge hint
SQL> WITH inline_view AS
  2  (
  3  SELECT /*+ no_merge qb_name(cust_totals)*/ CUST_NO, SUM(TOTAL_ORDER_PRICE) AS total_orders
  4    FROM ord2
  5   GROUP BY CUST_NO
  6  )
  7  SELECT /*+ qb_name(main) */ *
  8    FROM inline_view where total_orders > 1400000;

        CUST_NO    TOTAL_ORDERS
--------------- ---------------
           9379      1409495.95

And now here is the plan.  Notice that the subquery factor isn’t completely merged in, but it does become basically an inline view instead.  This of course might be just fine and works the way you want, however it’s not a materialized version of the subquery factor, but an inline view. 

SQL_ID  6mf7u56n8bad6, child number 0
-------------------------------------
WITH inline_view AS ( SELECT /*+ no_merge qb_name(cust_totals)*/
CUST_NO, SUM(TOTAL_ORDER_PRICE) AS total_orders   FROM ord2  GROUP BY
CUST_NO ) SELECT /*+ qb_name(main) */ *   FROM inline_view where
total_orders > 1400000

Plan hash value: 2169976290

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    69 (100)|          |
|   1 |  VIEW                |      |     1 |    26 |    69   (2)| 00:00:01 |
|*  2 |   FILTER             |      |       |       |            |          |
|   3 |    HASH GROUP BY     |      |     1 |    11 |    69   (2)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| ORD2 | 12890 |   138K|    68   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - CUST_TOTALS / INLINE_VIEW@MAIN
   2 - CUST_TOTALS
   4 - CUST_TOTALS / ORD2@CUST_TOTALS

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SUM("TOTAL_ORDER_PRICE")>1400000)