When doing
some work the more efficiently we do it the better. This is true in real life and with our
queries we write in SQL. Imagine you had
a landscape supply company deliver a load of bricks you will use for a
patio. The problem is they are sitting
in your drive way and the patio is in the back yard. You could move them one at a time from the
driveway to the back of the house. Or
you could get a wheel barrel and move them several at a time. Which way do you think you’ll move the bricks
faster?
The same can
happen with our queries. And the STARTS column in the run time stats of the plan can help. You will see this stat using the format option
IOSTATS or ALLSTATS. You can also see it in the STAT lines starting in 12.2 in the 10046 trace files.
You need information
from a table within the query. Certainly
a natural way to do this is to write a subquery to get it. For example I want a report of customers, the
number of orders and the average order amount.
I could write it like this (the tables I’m using here are nonsensical but
do the same thing):
select /*+
qb_name(mainblk) gather_plan_statistics */ a.username customer_name,
(select /*+ qb_name(cntblk) */ nvl(count(*),0)
from scalar_allobjects b where b.owner = a.username) cnt_orders,
(select /*+ qb_name(avgblk) */
nvl(round(avg(object_id),2),0) from scalar_allobjects b where b.owner =
a.username) avg_order_amt
from
scalar_allusers a
order by
a.username
/
When it
runs, the run time stats look like this, notice the STARTS column in particular.
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time
| Buffers |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 48 |00:00:00.55 | 128K|
| 1 |
SORT AGGREGATE | | 48 |
1 | 48 |00:00:00.27 | 64320 |
|* 2 |
TABLE ACCESS FULL| SCALAR_ALLOBJECTS | 48 |
2167 | 63383 |00:00:00.28 | 64320 |
| 3 |
SORT AGGREGATE | | 48 |
1 | 48 |00:00:00.28 | 64320 |
|* 4 |
TABLE ACCESS FULL| SCALAR_ALLOBJECTS | 48 |
2167 | 63383 |00:00:00.30 | 64320 |
| 5 |
SORT ORDER BY | | 1 |
48 | 48 |00:00:00.55 | 128K|
| 6 |
TABLE ACCESS FULL| SCALAR_ALLUSERS
| 1 | 48 |
48 |00:00:00.01 | 9 |
--------------------------------------------------------------------------------------------------
Query Block
Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - CNTBLK
2 - CNTBLK
/ B@CNTBLK
3 - AVGBLK
4 - AVGBLK
/ B@AVGBLK
5 - MAINBLK
6 - MAINBLK / A@MAINBLK
Predicate
Information (identified by operation id):
---------------------------------------------------
2 -
filter("B"."OWNER"=:B1)
4 -
filter("B"."OWNER"=:B1)
So it’s
doing some 48 full table scans for each sub query, on the same table. That’s 96 full table scans on that table
alone. An index might help, but
really the problem is that the subqueries are running way to many times. How
about a join of the two tables instead? It has to be an outer join, because there are some customers who have not
placed any orders yet so they wouldn’t show up at all with an inner join. Also we can’t do a COUNT(*) because that
counts the occurrence of a row, and with an outer join we’d get a row even
where there isn’t a match. So the query
would look like this:
select /*+
qb_name(mainblk) gather_plan_statistics */ a.username customer_name,
nvl(count(b.owner),0) cnt_orders,
nvl(round(avg(b.object_id),2),0) avg_order_amt
from
scalar_allusers a, scalar_allobjects b
where
b.owner(+) = a.username
group by
a.username
order by
a.username
/
When this on
runs its stats look like this:
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time
| Buffers |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
1 | | 48 |00:00:00.02 | 1349 |
| 1 |
SORT GROUP BY | | 1 |
35 | 48 |00:00:00.02 | 1349 |
|* 2 |
HASH JOIN OUTER | | 1 |
75842 | 63397 |00:00:00.01 | 1349 |
| 3 |
TABLE ACCESS FULL| SCALAR_ALLUSERS
| 1 | 48 |
48 |00:00:00.01 | 9 |
| 4 |
TABLE ACCESS FULL| SCALAR_ALLOBJECTS | 1 |
75829 | 75829 |00:00:00.01 | 1340 |
---------------------------------------------------------------------------------------------------
Query Block
Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - MAINBLK
3 - MAINBLK / A@MAINBLK
4 - MAINBLK / B@MAINBLK
Predicate
Information (identified by operation id):
---------------------------------------------------
2 -
access("B"."OWNER"="A"."USERNAME")
Now there is
only one full table scan on each table.
Notice the starts column this time.
Also it does a significantly lower amount of LIOs (the buffers column),
the first one doing over 128,000 and the second one at 1,349. And to boot, it’s over ten times faster, no surprise
when it’s doing about 1% of the LIOs. I
did run this test many times and it was always over ten times faster.
The STARTS column is a great way to see if there is some part of the
plan that is running excessively. If you
see this there are 3 basic attack patterns: turn it into a join (as done here),
use the WITH clause and turn it into a subquery
factor and join that into the main query, or put it into the FROM clause as an inline view which is joined to the other tables.
All
these have the same net effect, join the table into the query don’t select from
it as a subquery that is run repeatedly. More often than not this is going to do less
work and hence will be faster.
Here is the
full script I used for this example:
set
serveroutput off
column
customer_name format a30
column
cnt_orders format 99,999,999
column
avg_order_amt format 99,999,999.99
set termout
off
select /*+
qb_name(mainblk) gather_plan_statistics */ a.username customer_name,
(select /*+ qb_name(cntblk) */ nvl(count(*),0)
from scalar_allobjects b where b.owner = a.username) cnt_orders,
(select /*+ qb_name(avgblk) */
nvl(round(avg(object_id),2),0) from scalar_allobjects b where b.owner = a.username)
avg_order_amt
from
scalar_allusers a
order by
a.username
/
set termout
on
select *
from table(dbms_xplan.display_cursor (format=>'iostats last alias'))
/
set termout
off
select /*+
qb_name(mainblk) gather_plan_statistics */ a.username customer_name,
nvl(count(b.owner),0) cnt_orders,
nvl(round(avg(b.object_id),2),0) avg_order_amt
from
scalar_allusers a, scalar_allobjects b
where
b.owner(+) = a.username
group by
a.username
order by
a.username
/
set termout
on
select *
from table(dbms_xplan.display_cursor (format=>'iostats last alias'))
/
Creating the
two tables:
create
table scalar_allobjects as select * from all_objects;
alter table
scalar_allobjects add (constraint allobjects_pkey primary key (object_id));
create
table scalar_allusers as select * from all_users;
alter table
scalar_allusers add (constraint allusers_pk primary key (user_id));
exec
dbms_stats.gather_table_stats (user, 'scalar_allobjects', method_opt => 'for
all columns size 1', -
estimate_percent => 100, cascade =>
TRUE, no_invalidate => FALSE) ;
exec
dbms_stats.gather_table_stats (user, 'scalar_allusers', method_opt => 'for
all columns size 1', -
estimate_percent => 100, cascade =>
TRUE, no_invalidate => FALSE) ;
No comments:
Post a Comment