Tuesday, May 11, 2010

What’s going on with STAT lines in 10046 trace data?

I’ve long been teaching folks about the value of using the STAT lines in 10046 Trace data to help in optimizing queries. With STAT lines it’s been quite easy to see which part of a plan takes up the most time and does the most LIOs (Logical IOs).

This is a simple plan from 10.2:

STAT #8 id=1 cnt=348 pid=0 pos=1 obj=0
op='SORT GROUP BY (cr=494 pr=19 pw=0 time=603190 us)'



STAT #8 id=2 cnt=2399 pid=1 pos=1 obj=0
op='HASH JOIN (cr=494 pr=19 pw=0 time=595897 us)'


STAT #8 id=3 cnt=437 pid=2 pos=1 obj=57036
op='TABLE ACCESS FULL ORD2 (cr=184 pr=11 pw=0 time=24947 us)'



STAT #8 id=4 cnt=70975 pid=2 pos=2 obj=57039

op='TABLE ACCESS FULL ORD_ITEM2 (cr=310 pr=8 pw=0 time=212985 us)'


Each of the above is one line; I’ve broken it into two to make it a bit easier to read.

The ID and PID show the relationship between the steps. The ID shows the order of the steps in the plan, and the PID shows which step is the parent of a step. Hence the last two lines are children of the second line, they each have a PID of 2. And the second step is the only child of the first step. The first step has no parent since its PID is 0 (zero).

The numbers in the parenthesis are the really cool ones. These numbers tell us about how much work was done for each row source operation, including its children. So for example, the CR value (which is the LIOs) in the HASH JOIN line of 494 includes the 184 and the 310 of the next two lines. So the HASH JOIN didn’t really do any LIOs. All the LIOs were in the two other steps. 184 + 310 = 494

Of particular interest can be the TIME value. It also includes the children of a step. Here we can see which step took up the most time. The HASH JOIN shows a TIME of 595, 897, if we subtract out of that the TIME of the next two steps 24,947 + 212,985 = 237,932 so 595,897 – 237,932 = 357,965. This tells us that even though the HASH JOIN didn’t do any LIOs it did take the longest time to run.

These TIME values are in Micro Seconds (that’s what the “us” is after the time, it’s really a µs.)

This is great stuff and I use this to drill right to the problem step of a plan. However things are not looking good with 11.

Here is a simple plan in 11.1.0.6 (side note, this is the good old EMP - DEPT join, which has been a NESTED LOOP join forever, notice in 11 it’s now a SORT MERGE plan):

STAT #6 id=1 cnt=13 pid=0 pos=1 obj=0
op='MERGE JOIN (cr=6 pr=0 pw=0 time=144 us cost=5 size=767 card=13)‘


STAT #6 id=2 cnt=4 pid=1 pos=1 obj=71198
op='TABLE ACCESS BY INDEX ROWID DEPT (cr=4 pr=0 pw=0 time=44 us cost=2 size=80 card=4)'


STAT #6 id=3 cnt=4 pid=2 pos=1 obj=71199
op='INDEX FULL SCAN DEPT_DEPTNO_PK (cr=2 pr=0 pw=0 time=26 us cost=1 size=0 card=4)'


STAT #6 id=4 cnt=13 pid=1 pos=2 obj=0
op='SORT JOIN (cr=2 pr=0 pw=0 time=22 us cost=3 size=507 card=13)'


STAT #6 id=5 cnt=13 pid=4 pos=1 obj=71200
op='TABLE ACCESS BY INDEX ROWID EMP (cr=2 pr=0 pw=0 time=72 us cost=2 size=507 card=13)'


STAT #6 id=6 cnt=13 pid=5 pos=1 obj=71205
op='INDEX FULL SCAN EMP_DEPT_IDX (cr=1 pr=0 pw=0 time=24 us cost=1 size=0 card=13)'


Well it may not be clear from just looking at the stat lines but the timings are a miss. The SORT JOIN step (ID 4) shows a TIME of 22 and it has one child (ID 5). This 22 can't included the time of step 5 which is 72 (the 24 in step 6 is part of the 72.) There seems to be a pattern of any time there is a sort in the plan, the timings seem to be off.

But wait it gets worse!

In 11.1.0.7 we see this:

STAT #17 id=1 cnt=13 pid=0 pos=1 obj=0
op='MERGE JOIN (cr=6 pr=0 pw=0 time=0 us cost=5 size=767 card=13)'


STAT #17 id=2 cnt=4 pid=1 pos=1 obj=77595
op='TABLE ACCESS BY INDEX ROWID DEPT (cr=4 pr=0 pw=0 time=0 us cost=2 size=80 card=4)'


STAT #17 id=3 cnt=4 pid=2 pos=1 obj=77596
op='INDEX FULL SCAN DEPT_DEPTNO_PK (cr=2 pr=0 pw=0 time=0 us cost=1 size=0 card=4)'


STAT #17 id=4 cnt=13 pid=1 pos=2 obj=0
op='SORT JOIN (cr=2 pr=0 pw=0 time=0 us cost=3 size=507 card=13)'


STAT #17 id=5 cnt=13 pid=4 pos=1 obj=77597
op='TABLE ACCESS BY INDEX ROWID EMP (cr=2 pr=0 pw=0 time=0 us cost=2 size=507 card=13)'


STAT #17 id=6 cnt=13 pid=5 pos=1 obj=77602
op='INDEX FULL SCAN EMP_DEPT_IDX (cr=1 pr=0 pw=0 time=0 us cost=1 size=0 card=13)'


Now all the TIME values are Zero!! Well maybe this is better, at least we don’t have numbers that don’t add up! Oh and before you ask, yes STATISTICS_LEVEL was set to ALL for these tests.

I’m not sure what is going on but this is a bad sign. We at one time had a great way to see what was happening for each line in a SQL statement. Now it appears that we are getting either bad data or no data. This is quite distressing.

I don’t have an 11.2 database currently to test on. Maybe it’s fixed, but I’m doubtful. Based on the bugs I’ve seen on Metalink that address this issue, the problem still seems to be there in 11.2.

In 11.1.0.6 the data in the view V$SQL_PLAN_STATISTICS and $SQL_PLAN_STATISTICS_ALL seemed to be unaffected by this. But in 11.1.0.7 this “feature” has gotten into those views as well.