Ric Ramblings: Clusters in the cloud or not

Many folks are familiar with the single table hash cluster in Oracle land. However there are really two kinds of clusters in Oracle land, Index and Hash. Both are rarely used these days and have their roots way back in Oracle time when these were the only other alternative to a POT (Plain Old Table). Should you use them? Most likely no, I’ll talk about them and show you why not.

The index cluster is a way to pre-join table data. This is also the default type of cluster. It’s a multi-step process; you first create the cluster, then the tables in the cluster, then the index on the cluster key. What happens at the block level is that data of the same cluster key are stored together. The main point here is that unlike a POT, data from the set of tables are stored together in the same block. (Typically there will be two tables in an index cluster but you can have more.)

Using the classic example for the good old EMP and DEPT tables if you were to create a cluster for these you’d use DEPTNO as the cluster key. Then create each table in the cluster. Now the table data for a matching department number in both tables are stored together in the same data block. Note that the cluster is now the storage segment, not the tables.

On the surface this can seem like a great thing, and it is provided you always retrieve the data in the joined state. But what if you access the tables independently? Keep in mind they are literally stored together in the same data blocks. This means that to get either table, you are scanning blocks that contain data from both tables. In a typical parent-child table relationship, the parent table has far fewer rows then the child. Hence to get a set of parent rows you may end having to go thru 100s or 1,000s of blocks to find a relatively small amount of data.

Some other issues are modifying the cluster key is more expensive than if the table were not clustered. If the data from the set of tables with the same key will occupy more than about 2 blocks, they you’re likely better off not clustering the tables. Also if the number of matching rows per key is very different, the cluster will likely have a lot of wasted space in it. Any block in the cluster will only have rows with one cluster key in it; hence you could have some blocks with very few rows in them.

So should you use them? Personally every time I tried to use them, the issues mentioned above ended up forcing us to go back to POTs. But if you always return the data in the joined form then they can be helpful in reducing IO since you could get data from all the tables with very few LIOs.

Now how about them Hash clusters? They too can have more than one table in it. It’s the same basic technique, create the cluster, and put the tables into it. You don’t have to create an index on the key. You can if you want to, but the idea of using a hash key is to not have an index on it. This time Oracle stores data in the same block that has the same hash value. Again if you retrieve the data from the set of tables joining on the hash key this is good.

These are popular as Single Table Hash Clusters, which are very cool because they allow you to get a row with two IO operations pretty much no matter how big the table is. With small tables this is easily achievable with an index. However as soon as the index grows and has two or more levels, the LIOs will increase with the levels of the index.

To create these you have to use the clause “SINGLE TABLE HASHKEYS N” where N is a positive value. This is the number of expected hash keys there will be in the table. This number is round up to the nearest prime number. Warning: Unlike a POT, Oracle will immediately allocate the space needed for all the HASHKEYS; hence a large number will allocate a lot of space right away. Of course if you need this then all is good, just be aware that it’s allocating all the space up front. Optionally you can also specify a SIZE parameter; this is the amount of space reserved for all rows with the same hash key. If not specified, then one data block is allocated for each key.

Also you can even define the hash algorithm used to hash the values. If you don’t Oracle uses one to hash them. This does give you a lot of control over how the data is stored and you can even predict the number of keys that will hash to the same value if you define it.

The best use of a single table hash cluster is looking up rows by the hash key using an equality predicate. You’ll see the access of the table as “TABLE ACCESS HASH”. Even when you retrieve a set of rows (when the hash key is not unique) this will likely have a few LIOs.

Here is a set of test queries that shows the good and bad of single table hash clusters. This test is using a table ORD2 a POT and ORD2_HASH in a single table hash cluster. The script is below; I’ll just show the output here.

These tests are running this query which will retrieve a count of 20 rows:

select count(*) into x1 from [TABLE] where cust_no = 32076;

......................................................

selecting from ORD2 10000 times a POT

......................................................

In hundredths of a second

**** TIME - 524

**** CPU - 524

**** LIO - 1910000

......................................................

selecting from ORD2 10000 times a POT with index

......................................................

In hundredths of a second

**** TIME - 53

**** CPU - 50

**** LIO - 20000

......................................................

selecting from ORD2_HASH 10000 times a single table hash cluster

......................................................

In hundredths of a second

**** TIME - 47

**** CPU - 45

**** LIO - 10079

What you can see in this first set of tests that the single table hash cluster beat out the POT even when it had an index on the column. So this is looking good for the single table hash cluster. Now let’s see how things go with a range scan.

These tests are running this query which will retrieve a count of 26 rows:

select count(*) into x1 from [TABLE] where cust_no between 32076 and 32080;

......................................................

range scan from ORD2 10000 times a POT with index

......................................................

In hundredths of a second

**** TIME - 50

**** CPU - 50

**** LIO - 20000

......................................................

range scan from ORD2_HASH 10000 times a single table hash cluster

......................................................

In hundredths of a second

**** TIME - 1273

**** CPU - 1265

**** LIO - 3170067

......................................................

range scan from ORD2_HASH 10000 times a single table hash cluster with index

......................................................

In hundredths of a second

**** TIME - 53

**** CPU - 51

**** LIO - 20014

Here you can see the single table hash cluster really doesn’t help. Without an index on the column it reverts to a full table scan. With an index it’s really the same as the POT with an index (the extra LIOs and time are from the parsing of the statement because of the new index). Is that really worth it? Of course that is a question only you can answer for your environment.

As you might guess most of the issues with index clusters are still there with hash clusters. A hash cluster is likely not a good choice is where the hash key is updated. This will require the row to move (most likely) and will in effect cause a delete/insert like activity which will be slower than just updating the field. Doing full table scans on tables in a hash cluster will likely be longer, since Oracle only stores data in a block with the same hash value, there is likely to be blocks with only a few rows in them. And it’s really important to use equality predicates when accessing the hash key. Other types of predicates will tend to drive full scans as seen above. Yes you can have an index on the column, but the basic idea of using a cluster is to not use an index. Because if you’re going to use an index, than you’re likely better off with just a POT with and index.

What you should glen from this is that you pretty much have to know the size of the data you’ll be using in the single hash table. You need to know the number of rows that will hash to the same value (the HASHKEYS) and how big that total set of rows will be (the SIZE). Using a single table hash cluster is not likely to work optimally when these are not known. You don’t necessarily have to be precise with these values, but you should be close and likely over estimate a bit. The issue here is if you under estimate to much then over flow blocks get created and you start to lose the advantage of the hashing. In which case your performance is likely back to what a POT and indexing would have been, maybe worse.

What’s the bottom line? Using clusters is cool and they do work well for very specific usages. If you wander outside these usages then they will likely not help and may even hurt performance.

Here is the script I used for the tests show above. The script to create the ORD2 table is a bit too big to put in here as a blog post. If you’d like it please contact me ric.van.dyke at hotsos dot com and I can send it to you.

rem testing selecting from a POT vs a single table hash cluster

rem OCT2018 RVD

rem

set echo on feedback on serveroutput on

drop index ord2_cust_no;

drop index ord2h_cust_no;

drop table ord2_hash;

drop cluster cust_orders;

CREATE CLUSTER cust_orders (cust_id NUMBER(*,0))

SINGLE TABLE HASHKEYS 200;

create table ord2_hash

cluster cust_orders(cust_no)

as select * from ord2;

declare

x1 number;

l_start_time pls_integer;

l_start_cpu pls_integer;

l_start_cr pls_integer :=0;

l_end_cr pls_integer :=0;

begin

select value into l_start_cr from v$mystat where STATISTIC# = 139;

l_start_time := DBMS_UTILITY.GET_TIME;

l_start_cpu := DBMS_UTILITY.GET_CPU_TIME;

for ii in 1 .. 10000 loop

select count(*) into x1 from ord2 where cust_no = 32076;

end loop;

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('selecting from ORD2 10000 times a POT');

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('In hundredths of a second');

DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));

DBMS_OUTPUT.put_line ('**** CPU - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));

select value into l_end_cr from v$mystat where STATISTIC# = 139;

DBMS_OUTPUT.put_line ('**** LIO - '||to_char( l_end_cr - l_start_cr));

end;

create index ord2_cust_no on ord2(cust_no);

declare

x1 number;

l_start_time pls_integer;

l_start_cpu pls_integer;

l_start_cr pls_integer :=0;

l_end_cr pls_integer :=0;

begin

select value into l_start_cr from v$mystat where STATISTIC# = 139;

l_start_time := DBMS_UTILITY.GET_TIME;

l_start_cpu := DBMS_UTILITY.GET_CPU_TIME;

for ii in 1 .. 10000 loop

select count(*) into x1 from ord2 where cust_no = 32076;

end loop;

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('selecting from ORD2 10000 times a POT with index');

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('In hundredths of a second');

DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));

DBMS_OUTPUT.put_line ('**** CPU - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));

select value into l_end_cr from v$mystat where STATISTIC# = 139;

DBMS_OUTPUT.put_line ('**** LIO - '||to_char( l_end_cr - l_start_cr));

end;

declare

x1 number;

l_start_time pls_integer;

l_start_cpu pls_integer;

l_start_cr pls_integer :=0;

l_end_cr pls_integer :=0;

begin

select value into l_start_cr from v$mystat where STATISTIC# = 139;

l_start_time := DBMS_UTILITY.GET_TIME;

l_start_cpu := DBMS_UTILITY.GET_CPU_TIME;

for ii in 1 .. 10000 loop

select count(*) into x1 from ord2_hash where cust_no = 32076;

end loop;

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('selecting from ORD2_HASH 10000 times a single table hash cluster');

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('In hundredths of a second');

DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));

DBMS_OUTPUT.put_line ('**** CPU - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));

select value into l_end_cr from v$mystat where STATISTIC# = 139;

DBMS_OUTPUT.put_line ('**** LIO - '||to_char( l_end_cr - l_start_cr));

end;

rem now doing a range scan leaving the index in place on ORD2 for CUST_NO

rem but no index on ORD2_HASH for CUST_NO

declare

x1 number;

l_start_time pls_integer;

l_start_cpu pls_integer;

l_start_cr pls_integer :=0;

l_end_cr pls_integer :=0;

begin

select value into l_start_cr from v$mystat where STATISTIC# = 139;

l_start_time := DBMS_UTILITY.GET_TIME;

l_start_cpu := DBMS_UTILITY.GET_CPU_TIME;

for ii in 1 .. 10000 loop

select count(*) into x1 from ord2 where cust_no between 32076 and 32080;

end loop;

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('range scan from ORD2 10000 times a POT with index');

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('In hundredths of a second');

DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));

DBMS_OUTPUT.put_line ('**** CPU - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));

select value into l_end_cr from v$mystat where STATISTIC# = 139;

DBMS_OUTPUT.put_line ('**** LIO - '||to_char( l_end_cr - l_start_cr));

end;

declare

x1 number;

l_start_time pls_integer;

l_start_cpu pls_integer;

l_start_cr pls_integer :=0;

l_end_cr pls_integer :=0;

begin

select value into l_start_cr from v$mystat where STATISTIC# = 139;

l_start_time := DBMS_UTILITY.GET_TIME;

l_start_cpu := DBMS_UTILITY.GET_CPU_TIME;

for ii in 1 .. 10000 loop

select count(*) into x1 from ord2_hash where cust_no between 32076 and 32080;

end loop;

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('range scan from ORD2_HASH 10000 times a single table hash cluster');

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('In hundredths of a second');

DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));

DBMS_OUTPUT.put_line ('**** CPU - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));

select value into l_end_cr from v$mystat where STATISTIC# = 139;

DBMS_OUTPUT.put_line ('**** LIO - '||to_char( l_end_cr - l_start_cr));

end;

rem lastly put an index on ORD2_HASH.CUST_NO

create index ord2h_cust_no on ORD2_HASH(CUST_NO);

declare

x1 number;

l_start_time pls_integer;

l_start_cpu pls_integer;

l_start_cr pls_integer :=0;

l_end_cr pls_integer :=0;

begin

select value into l_start_cr from v$mystat where STATISTIC# = 139;

l_start_time := DBMS_UTILITY.GET_TIME;

l_start_cpu := DBMS_UTILITY.GET_CPU_TIME;

for ii in 1 .. 10000 loop

select count(*) into x1 from ord2_hash where cust_no between 32076 and 32080;

end loop;

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('range scan from ORD2_HASH 10000 times a single table hash cluster with index');

DBMS_OUTPUT.put_line ('......................................................');

DBMS_OUTPUT.put_line ('In hundredths of a second');

DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));

DBMS_OUTPUT.put_line ('**** CPU - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));

select value into l_end_cr from v$mystat where STATISTIC# = 139;

DBMS_OUTPUT.put_line ('**** LIO - '||to_char( l_end_cr - l_start_cr));

end;

2 comments:

UnknownOctober 24, 2018 at 2:57 PM
Ric, nice article. I have used indexed clusters a couple of times. In both cases, an essential requirement was that the tables are always (well, at least almost always) accessed together. The first time I used them was to store lab test results. The parent table contained the unit of measure and the child table contained the numeric amounts. The amount means nothing with the unit of measure and vice versa, so these tables were always accessed together.
Ric Van DykeOctober 27, 2018 at 7:12 AM
Yes that is a perfect use of the Index Cluster. That kind of data is exactly what they were designed for. And it's likely there are a couple of tables like that in many applications. It will be the exception and not the rule, as the saying goes. Thanks for the great example!

Friday, October 12, 2018

Clusters in the cloud or not

2 comments:

Blog Archive