Archive for July, 2015

Measuring I/O waste on incremental statistics gathering on partitioned tables on 11gR2 (STALE_PERCENT does not work with partitions)

Wednesday, July 15th, 2015

Inspiration to this article is this post: http://progeeking.com/2015/03/09/optimizer-statistics-stale-percentage-and-partitioning/ Please be sure to read it first. The main difference is looking at exactly what reads does the DBMS_STATS perform having INCREMENTAL=TRUE scenario.

Apparently it is well known fact that STALE_PERCENT does not work with (sub)partitions on 11gR2, however one might be interested in knowing exactly what additional I/O from DBMS_STATS can be generated due to this “bug”/missing feature. I’ve used the below SQL script to measure this on pretty recent 11.2.0.4.x edition:

-- prepare start
drop table stats_test purge;
exec dbms_stats.set_global_prefs('trace',to_char(0));

create table stats_test (
  id number primary key,
  merchant_id number not null,
  accounting_date date not null,
  somedata varchar2(128)
)
partition by range (merchant_id) interval (1)
subpartition by hash (accounting_date) SUBPARTITIONS 8
(partition p0 values less than (1))
;

insert into stats_test select rownum, MOD(rownum, 4), sysdate-(rownum/86400), 'blablabla'  from
  (select rownum r from dual connect by rownum <= 1000) a,
  (select rownum r from dual connect by rownum <= 1000) b,
  (select rownum r from dual connect by rownum <= 1000) c
where rownum <= 100000;

create index stats_test_idx1 on stats_test (somedata) local;

exec dbms_stats.set_table_prefs('JWARTAK', 'STATS_TEST', 'INCREMENTAL', 'TRUE');
-- just to be sure, it seems that ALL doesnt work with INCREMENTAL=TRUE
-- so with this AUTO setting you end up with no subpartition stats
-- you can tweak between AUTO and ALL here
exec dbms_stats.set_table_prefs('JWARTAK', 'STATS_TEST', 'GRANULARITY', 'ALL');
-- just to be sure
exec dbms_stats.set_table_prefs('JWARTAK', 'STATS_TEST', 'ESTIMATE_PERCENT', DBMS_STATS.AUTO_SAMPLE_SIZE); 

-- 10% by default but just to be sure
select DBMS_STATS.GET_PREFS ('STALE_PERCENT','JWARTAK','STATS_TEST') stale_percent from dual;

-- just in case
exec dbms_stats.gather_table_stats('JWARTAK', 'STATS_TEST', cascade=>true);
exec dbms_stats.gather_table_stats('JWARTAK', 'STATS_TEST', cascade=>true);
-- prepare end

---experiement start , see STALE_STATS
select PARTITION_NAME,SUBPARTITION_NAME,SUBPARTITION_POSITION, STALE_STATS, to_char(LAST_ANALYZED, 'HH24:MI DD/MM/YYYY') last_analyzed from user_tab_statistics where table_name='STATS_TEST' order by PARTITION_NAME, SUBPARTITION_POSITION;

select count(*) from stats_test;
select count(*) from stats_test where MERCHANT_ID=1;
-- just add 1 row (so 25000 -> 25001 rows in partition; 1 row just adds one row in subpartition - so now 1 out of 8 subpartitions has 3126 rows )
-- 3125 rows -> 3126 rows in subpartition is enough trigger whole partition dbms_stats scan
insert into stats_test (id, merchant_id, accounting_date, somedata) values (110012, 1, sysdate, 'blah');
commit;

-- just in case
exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

-- at this point all STALE_STATS = null (subpartitions) or =NO (undetected change)
select PARTITION_NAME,SUBPARTITION_NAME,SUBPARTITION_POSITION, STALE_STATS, to_char(LAST_ANALYZED, 'HH24:MI DD/MM/YYYY') last_analyzed from user_tab_statistics where table_name='STATS_TEST' order by PARTITION_NAME, SUBPARTITION_POSITION;

-- 1st run, you will get some dirty results
exec dbms_stats.gather_table_stats('JWARTAK', 'STATS_TEST');

-- another run to see the difference,
exec dbms_stats.gather_table_stats('JWARTAK', 'STATS_TEST');

-- enable debug mode
set serveroutput on
--exec dbms_stats.set_global_prefs('trace',to_char(1+4+8+16+64+2048+4096+8192));
exec dbms_stats.set_global_prefs('trace',to_char(4+8+16+64+2048+4096+8192));

-- repeat insert (now 25001 -> 25002 row in single subpartition)
insert into stats_test (id, merchant_id, accounting_date, somedata) values (110016, 1, sysdate, 'blah');
commit;
-- you can comment this out, at least on 11.2.0.4.x there is no difference
exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

-- 2nd run, >>> REAL RESULT <<<
exec dbms_stats.gather_table_stats('JWARTAK', 'STATS_TEST');

-- EXPERIMENT END

As you can see from the above there are two main variables here:
- dbms_stats granularity – you can set your own ALL, AUTO or any other to perform measurements
- trace level – based on http://www.pythian.com/blog/options-for-tracing-oracle-dbms_stats/ – the most interesting is where you want the output to go (diagnostic directory or via DBMS_OUTPUT buffer)
- FLUSH_DATABASE_MONITORING_INFO – you may want to flush it or not

The result is (at least on 11.2.0.4.x) that after adding just single row to single (sub)partition, even with INCREMENTAL_STATS=TRUE, you’ll get:

a) with GRANULARITY=AUTO (means table and partition but without dedicated subpartition stats, in most cases this will be “GLOBAL AND PARTITION”):

somewhere:~$ grep -i -e 'Started index ' -e stale -e 'need to be updated' -e 'gather index ' -e query /tmp/run    | grep -v Ending
DBMS_STATS: gather stats on partition SYS_P4061815: stats are stale;
DBMS_STATS: Starting query at 15-JUL-15 01.40.21.747485000 PM +02:00
DBMS_STATS: Starting query at 15-JUL-15 01.40.21.790532000 PM +02:00
DBMS_STATS: Started index JWARTAK.SYS_C005852548 at 15-JUL-15 01.40.21.899589000 PM +02:00 granularity: GLOBAL AND PARTITION gIdxGran:
DBMS_STATS: Starting query at 15-JUL-15 01.40.21.902620000 PM +02:00
DBMS_STATS: Started index JWARTAK.STATS_TEST_IDX1 at 15-JUL-15 01.40.21.974521000 PM +02:00 granularity: GLOBAL AND PARTITION gIdxGran:
DBMS_STATS: gather index stats STATS_TEST_IDX1onSYS_P4061834:
DBMS_STATS: Starting query at 15-JUL-15 01.40.21.978291000 PM +02:00
DBMS_STATS: Starting query at 15-JUL-15 01.40.22.000544000 PM +02:00
somewhere:~$

So this translates to:

  • 2x sampled FULL “PARTITION” SCANs against single partition where new row has been added
  • 1x sampled FULL INDEX SCAN on PK index
  • 2x sampled FULL INDEX SCANs against changed LOCAL index partition where new row has been added
  • certainly 1 new row out of ~25000 isn’t 10% change on partition

b) with GRANULARITY=ALL:

somewhere:~$ grep -i -e 'Started index ' -e stale -e 'need to be updated' -e 'gather index ' -e query /tmp/run_gALL    | grep -v Ending
DBMS_STATS: gather stats on subpartition SYS_SUBP4061878:stats are stale
DBMS_STATS: Starting query at 15-JUL-15 01.46.57.166410000 PM +02:00
DBMS_STATS: Starting query at 15-JUL-15 01.46.57.180291000 PM +02:00
DBMS_STATS: gather stats on partition SYS_P4061886: stats are stale;
DBMS_STATS: Starting query at 15-JUL-15 01.46.57.356143000 PM +02:00
DBMS_STATS: Starting query at 15-JUL-15 01.46.57.405171000 PM +02:00
DBMS_STATS: Started index JWARTAK.SYS_C005852574 at 15-JUL-15 01.46.57.522214000 PM +02:00 granularity: ALL gIdxGran:
DBMS_STATS: Starting query at 15-JUL-15 01.46.57.524947000 PM +02:00
DBMS_STATS: Started index JWARTAK.STATS_TEST_IDX1 at 15-JUL-15 01.46.57.593190000 PM +02:00 granularity: ALL gIdxGran:
DBMS_STATS: gather index statsSTATS_TEST_IDX1 on SYS_SUBP4061878:stats are stale
DBMS_STATS: Starting query at 15-JUL-15 01.46.57.596544000 PM +02:00
DBMS_STATS: gather index stats STATS_TEST_IDX1onSYS_P4061905:stats are stale
DBMS_STATS: Starting query at 15-JUL-15 01.46.57.601476000 PM +02:00
DBMS_STATS: Starting query at 15-JUL-15 01.46.57.622425000 PM +02:00
somewhere:~$
  • 2x sampled FULL “SUBPARTITION” SCANs against single subpartition where new raw has been added
  • 2x sampled FULL “PARTITION” SCANs against single partition where new raw has been added
  • sampled FULL INDEX SCAN on PK index
  • 1x sampled FULL INDEX SCAN against changed LOCAL index subpartition where new raw has been added
  • 2x sampled FULL INDEX SCANs against changed LOCAL index partition where new raw has been added
  • certainly 1 new row out of ~25000 isn’t 10% change on partition or subpartition here

SUMMARY: So with 1 single row added/deleted or just update, in some random subpartition, we can cause on 11gR2 an avalanche of I/O reads due to DBMS_STATS scanning unnecessary segments. When you start to think about table sizes that are above 1-2TB combined with weak I/O subsystem and rather “big” indexes (e.g. 100-200GB), you can quickly run out of MAINTENANCE_WINDOW schedule due to this overhead (in extreme it could be several hundredths GB scanned via DBMS_STATS due to 1 byte changed ;) ).

Some other interesting results:

  1. as was demonstrated STALE_PERCENT does not work against (sub)partitions in 11gR2
  2. since 12gR1 you have an access to INCREMENTAL_STATS + INCREMENTAL_STALENESS (new feature) combo, where you can just set INCREMENTAL_STALENESS to USE_STALE_PERCENT and trigger stats collection for (sub)partitions only after certain threshold of changed rows is reached
  3. apparently and contrary to documentation on 11.2.0.4.x series, INCREMENTAL_STATS=TRUE works with GRANULARITY=ALL (at least in this test)
  4. STALE_STATS column from {user|all|dba}_{tab|ind}_statistics is useless for partitioned tables (flushing does not help anyway), DBMS_STATS GATHER LIST is much better for this purpose
  5. there is interesting granularity option APPROX_GLOBAL AND PARTITION but I’m not sure is it 10.2, 11.1 history artifact as i did nott have enough time to test it in details

Insert with IGNORE_ROW_ON_DUPKEY_INDEX on Index Organized Table might result in lost data

Wednesday, July 8th, 2015

1. We create table red_table with PK on both columns ID1, ID2 with 10000 rows
3. We create table delta_table with PK on both columns ID1, ID2 with only 500 rows like in red_table (first 500 rows)
3. We load additional 100 completely new/different rows into delta_table
4. So as of now we have 600 rows in delta_table and 10000 in red_table. The 100 rows are completely different (one can use A MINUS B to check that)
5. Try to load rows from delta_table into red_table using +IGNORE_ROW_ON_DUPKEY_INDEX hint that do NOT yet exists in red_table.
HEAP) 100 rows inserted, 2nd insert adds 0 rows (correct)
IOT) 60 rows inserted, WTF/BUG due to 40 not loaded rows!
6. “analyze table RED_TABLE validate structure cascade” returns no errors so we have no corruption (in both HEAP and IOT cases)
7. So as of now we have
HEAP) 10100 rows in red_table, and delta_table MINUS red_table returns 0 (no differences)
IOT) 10060 rows in red_table, and delta_table MINUS red_table returns 40 (missing data!)

--1
drop table red_table purge;
create table red_table (id1 number, id2 number, constraint RED_TABLE_PK primary key(id1, id2)) organization index;
insert into red_table (id1, id2) select object_id, object_id from all_objects where rownum <= 10000;
commit;

--2 & 3
drop table delta_table;
create table delta_table (id1, id2) as select * from red_table where rownum <= 500;
insert into delta_table (id1, id2) select 10000000+object_id, object_id from all_objects where rownum <= 100;
commit;

-- 4
select (select count(*) from delta_table) delta, (select count(*) from red_table) red from dual;
--
select count(*) from (
select * from delta_table
minus
select * from red_table
);

-- 5
insert /*+ IGNORE_ROW_ON_DUPKEY_INDEX(RED_TABLE, RED_TABLE_PK) */ into red_table (id1, id2) select id1, id2 from delta_table;
commit;

-- 6
analyze table RED_TABLE validate structure cascade;

-- 7
select (select count(*) from delta_table) delta, (se1lect count(*) from red_table) from dual;

select count(*) from (
select * from delta_table
minus
select * from red_table
);

Reproduced on versions:
- 11.2.0.3.13
- 11.2.0.4.2
- reproduced also with primary key being only ID1 (so this is NOT related to the composite PK on ID1+ID2 on IOT)

Bugs possibly linked to this issue:

  • Bug 17397545 ORA-600 [kdtigetrow-2] / buffer cache errors from MERGE statement with LOG ERRORS INTO
  • Bug 11865420 Insert as Select with LOG ERRORS INTO slower than expected / corruption using IGNORE_ROW_ON_DUPKEY_INDEX hint – superseded
  • Bug 14512508 : PROBLEM WITH HINT “IGNORE_ROW_ON_DUPKEY_INDEX”
  • Bug 17748658 : BUG 11865420 IS MARKED AS FIXED IN 11.2.0.4 BUT ISN’T REALLY FIXED
  • Bug 13033715 : WRONG RESULTS WITH HINT IGNORE_ROW_ON_DUPKEY_INDEX

Summary: do not use IGNORE_ROW_ON_DUPKEY_INDEX hint . Another interesting story is documented here http://guyharrison.squarespace.com/blog/2010/1/1/the-11gr2-ignore_row_on_dupkey_index-hint.html where it is clear (due to performance reasons) it is best avoid IGNORE_ROW_ON_DUPKEY_INDEX hint.