Wednesday, November 7, 2018

Bitmap index locking in the cloud or not


Bitmap indexes are really great for flexibility of data selection in data warehouse application environments.  This is achieved by the optimizer being able to use sets of bitmaps at a time to find data in the table and the set of indexes used can change based on the need of the moment.  Bitmap indexes can be merged together (typically with a bitmap AND or OR operation) unlike normal B-tree indexes.  But in a transactional system they can really cause problems.  And locking is the issue.  The problem is that when you lock an index entry in a bitmap index you lock a set of rowids, not just one as you do in a normal b-tree index. 

A simplified look at bitmap entry on a column COLOR would be something like this:

 Control Bytes    Key        Start ROWID       End ROWID         Bitmap
FLG-LCK-LEN      BLUE      AAAcc7AA…       AAAcc7AA…       1010111100000…

The range of rowids for any one entry could cover one extent of a table, but there could be more than one entry for a value on large extents.  What this means is that if I were to do any INSERTS, UPDATES or DELETES of rows for BLUE in this range, I’ve effectively locked that entire set of rows in the table.  I don’t of course; the lock is on the index entry not at the table level.  But from a functionally point of view that hardly matters, I can’t do DML on other rows in that same range with the same value.   (Here I mean DML as in the commands INSERT, UPDATE and DELETE.  The SELECT command is a DML statement but isn’t part of this discussion.)

Here is an example to illustrate the point. 

Session ONE                       Session TWO
Time 1 - insert into bitmaplock values (9000,'RED');
                                                Time 2 - insert into bitmaplock values (9001,'BLUE');
Time 3 - insert into bitmaplock values (9002,'BLUE');
Session one is now waiting on session two.
                                                Time 4 - insert into bitmaplock values (9003,'RED');
                                                Session two is now waiting on session one, a deadlock.
Session one is now gets this message:
insert into bitmaplock values (9002,'BLUE')
            *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

At this point session one must either commit or rollback to get out of the deadlock situation.  Reissuing the insert will just continue the deadlock and no one gets anywhere.

Notice that this example is using INSERT commands. It should be impossible to have a deadlock for an insert, since either session is aware of the other session’s row, how could they be deadlocked?  It’s because they are deadlocked on the bitmap index not the table data.   Of course if the two sessions happen to do the inserts into different extents of the table they would be fine.  Even so, I’m sure you can see this is just a tripwire that will be hit at some point.

The bottom line on this, don’t use bitmap indexes on table that have DML activity.  You may be able to get away with relatively small amounts of DML from a single session.  Many folks have found that even small amounts of DML can be problematic.  Hence they will drop all bitmaps on a table, do the DML operation, and then recreate the bitmap indexes.  Good news is that bitmap indexes tend to be much smaller and build faster than normal b-trees. 

Here is the code to do this example yourself:

drop table bitmaplock;

create table bitmaplock (id number, color varchar2(10));

begin
  for i in 1..1000 loop
    insert into bitmaplock values(1, 'RED');
    insert into bitmaplock values(2, 'WHITE');
    insert into bitmaplock values(3, 'BLUE');
    insert into bitmaplock values(4, 'YELLOW');
    insert into bitmaplock values(5, 'GREEN');
    insert into bitmaplock values(6, 'BROWN');
    insert into bitmaplock values(7, 'BLACK');
    insert into bitmaplock values(8, 'TAN');
    insert into bitmaplock values(9, 'GRAY');
    insert into bitmaplock values(10, 'GOLD');
  end loop;
  commit;
  end;
/

create bitmap index bitmaplock_idx on bitmaplock(color);

exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BITMAPLOCK', estimate_percent=>100, cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 1');

-- the following steps must be done in order
-- need two sessions using same schema
-- in first session
insert into bitmaplock values (9000,'RED');

-- in other session
insert into bitmaplock values (9001,'BLUE');

-- in first session
insert into bitmaplock values (9002,'BLUE');

-- in other session
insert into bitmaplock values (9003,'RED');

-- at this point you will get a deadlock error

Friday, October 12, 2018

Clusters in the cloud or not

Many folks are familiar with the single table hash cluster in Oracle land.  However there are really two kinds of clusters in Oracle land, Index and Hash.  Both are rarely used these days and have their roots way back in Oracle time when these were the only other alternative to a POT (Plain Old Table).   Should you use them?  Most likely no, I’ll talk about them and show you why not.

The index cluster is a way to pre-join table data.  This is also the default type of cluster.  It’s a multi-step process; you first create the cluster, then the tables in the cluster, then the index on the cluster key.   What happens at the block level is that data of the same cluster key are stored together. The main point here is that unlike a POT, data from the set of tables are stored together in the same block.  (Typically there will be two tables in an index cluster but you can have more.)  

Using the classic example for the good old EMP and DEPT tables if you were to create a cluster for these you’d use DEPTNO as the cluster key.   Then create each table in the cluster.  Now the table data for a matching department number in both tables are stored together in the same data block.  Note that the cluster is now the storage segment, not the tables.  

On the surface this can seem like a great thing, and it is provided you always retrieve the data in the joined state.  But what if you access the tables independently?  Keep in mind they are literally stored together in the same data blocks.  This means that to get either table, you are scanning blocks that contain data from both tables.   In a typical parent-child table relationship, the parent table has far fewer rows then the child.  Hence to get a set of parent rows you may end having to go thru 100s or 1,000s of blocks to find a relatively small amount of data.

Some other issues are modifying the cluster key is more expensive than if the table were not clustered.  If the data from the set of tables with the same key will occupy more than about 2 blocks, they you’re likely better off not clustering the tables.  Also if the number of matching rows per key is very different, the cluster will likely have a lot of wasted space in it.  Any block in the cluster will only have rows with one cluster key in it; hence you could have some blocks with very few rows in them.

 So should you use them?  Personally every time I tried to use them, the issues mentioned above ended up forcing us to go back to POTs.  But if you always return the data in the joined form then they can be helpful in reducing IO since you could get data from all the tables with very few LIOs. 

Now how about them Hash clusters?  They too can have more than one table in it.   It’s the same basic technique, create the cluster, and put the tables into it.  You don’t have to create an index on the key. You can if you want to, but the idea of using a hash key is to not have an index on it.  This time Oracle stores data in the same block that has the same hash value.   Again if you retrieve the data from the set of tables joining on the hash key this is good.

These are popular as Single Table Hash Clusters, which are very cool because they allow you to get a row with two IO operations pretty much no matter how big the table is.   With small tables this is easily achievable with an index.  However as soon as the index grows and has two or more levels, the LIOs will increase with the levels of the index. 

To create these you have to use the clause “SINGLE TABLE HASHKEYS N” where N is a positive value.  This is the number of expected hash keys there will be in the table.  This number is round up to the nearest prime number.  Warning: Unlike a POT, Oracle will immediately allocate the space needed for all the HASHKEYS; hence a large number will allocate a lot of space right away.  Of course if you need this then all is good, just be aware that it’s allocating all the space up front.   Optionally you can also specify a SIZE parameter; this is the amount of space reserved for all rows with the same hash key.  If not specified, then one data block is allocated for each key.   

Also you can even define the hash algorithm used to hash the values. If you don’t Oracle uses one to hash them.   This does give you a lot of control over how the data is stored and you can even predict the number of keys that will hash to the same value if you define it.

The best use of a single table hash cluster is looking up rows by the hash key using an equality predicate.  You’ll see the access of the table as “TABLE ACCESS HASH”.  Even when you retrieve a set of rows (when the hash key is not unique) this will likely have a few LIOs.

Here is a set of test queries that shows the good and bad of single table hash clusters.  This test is using a table ORD2 a POT and ORD2_HASH in a single table hash cluster.   The script is below; I’ll just show the output here.  

These tests are running this query which will retrieve a count of 20 rows:
select count(*) into x1 from [TABLE] where cust_no = 32076;
......................................................
selecting from ORD2 10000 times a POT
......................................................
In hundredths of a second
**** TIME - 524
**** CPU  - 524
**** LIO  - 1910000
......................................................
selecting from ORD2 10000 times a POT with index
......................................................
In hundredths of a second
**** TIME - 53
**** CPU  - 50
**** LIO  - 20000

......................................................
selecting from ORD2_HASH 10000 times a single table hash cluster
......................................................
In hundredths of a second
**** TIME - 47
**** CPU  - 45
**** LIO  - 10079

What you can see in this first set of tests that the single table hash cluster beat out the POT even when it had an index on the column.  So this is looking good for the single table hash cluster.  Now let’s see how things go with a range scan.

These tests are running this query which will retrieve a count of 26 rows:
select count(*) into x1 from [TABLE] where cust_no between 32076 and 32080;
......................................................
range scan from ORD2 10000 times a POT with index
......................................................
In hundredths of a second
**** TIME - 50
**** CPU  - 50
**** LIO  - 20000

......................................................
range scan from ORD2_HASH 10000 times a single table hash cluster
......................................................
In hundredths of a second
**** TIME - 1273
**** CPU  - 1265
**** LIO  - 3170067

......................................................
range scan from ORD2_HASH 10000 times a single table hash cluster with index
......................................................
In hundredths of a second
**** TIME - 53
**** CPU  - 51
**** LIO  - 20014

Here you can see the single table hash cluster really doesn’t help.  Without an index on the column it reverts to a full table scan.  With an index it’s really the same as the POT with an index (the extra LIOs and time are from the parsing of the statement because of the new index).  Is that really worth it?  Of course that is a question only you can answer for your environment.

As you might guess most of the issues with index clusters are still there with hash clusters.  A hash cluster is likely not a good choice is where the hash key is updated.  This will require the row to move (most likely) and will in effect cause a delete/insert like activity which will be slower than just updating the field.   Doing full table scans on tables in a hash cluster will likely be longer, since Oracle only stores data in a block with the same hash value, there is likely to be blocks with only a few rows in them.   And it’s really important to use equality predicates when accessing the hash key.  Other types of predicates will tend to drive full scans as seen above.   Yes you can have an index on the column, but the basic idea of using a cluster is to not use an index.  Because if you’re going to use an index, than you’re likely better off with just a POT with and index.

What you should glen from this is that you pretty much have to know the size of the data you’ll be using in the single hash table.  You need to know the number of rows that will hash to the same value (the HASHKEYS) and how big that total set of rows will be (the SIZE).  Using a single table hash cluster is not likely to work optimally when these are not known.  You don’t necessarily have to be precise with these values, but you should be close and likely over estimate a bit.  The issue here is if you under estimate to much then over flow blocks get created and you start to lose the advantage of the hashing.  In which case your performance is likely back to what a POT and indexing would have been, maybe worse.

What’s the bottom line?  Using clusters is cool and they do work well for very specific usages.  If you wander outside these usages then they will likely not help and may even hurt performance.


Here is the script I used for the tests show above.  The script to create the ORD2 table is a bit too big to put in here as a blog post.  If you’d like it please contact me ric.van.dyke at hotsos dot com and I can send it to you.

rem testing selecting from a POT vs a single table hash cluster
rem OCT2018  RVD
rem
set echo on feedback on serveroutput on
drop index ord2_cust_no;
drop index ord2h_cust_no;
drop table ord2_hash;
drop cluster cust_orders;

CREATE CLUSTER cust_orders (cust_id NUMBER(*,0))
SINGLE TABLE HASHKEYS 200;

create table ord2_hash
cluster cust_orders(cust_no)
as select * from ord2;

declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2 where cust_no = 32076;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('selecting from ORD2 10000 times a POT');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/

create index ord2_cust_no on ord2(cust_no);

declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2 where cust_no = 32076;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('selecting from ORD2 10000 times a POT with index');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/


declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2_hash where cust_no = 32076;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('selecting from ORD2_HASH 10000 times a single table hash cluster');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/

rem now doing a range scan leaving the index in place on ORD2 for CUST_NO
rem but no index on ORD2_HASH for CUST_NO

declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2 where cust_no between 32076 and 32080;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('range scan from ORD2 10000 times a POT with index');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/


declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2_hash where cust_no between 32076 and 32080;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('range scan from ORD2_HASH 10000 times a single table hash cluster');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/

rem lastly put an index on ORD2_HASH.CUST_NO

create index ord2h_cust_no on ORD2_HASH(CUST_NO);

declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2_hash where cust_no between 32076 and 32080;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('range scan from ORD2_HASH 10000 times a single table hash cluster with index');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/