Inspiration to this article is this post: http://progeeking.com/2015/03/09/optimizer-statistics-stale-percentage-and-partitioning/ Please be sure to read it first. The main difference is looking at exactly what reads does the DBMS_STATS perform having INCREMENTAL=TRUE scenario.
Apparently it is well known fact that STALE_PERCENT does not work with (sub)partitions on 11gR2, however one might be interested in knowing exactly what additional I/O from DBMS_STATS can be generated due to this “bug”/missing feature. I’ve used the below SQL script to measure this on pretty recent 11.2.0.4.x edition:
-- prepare start drop table stats_test purge; exec dbms_stats.set_global_prefs('trace',to_char(0)); create table stats_test ( id number primary key, merchant_id number not null, accounting_date date not null, somedata varchar2(128) ) partition by range (merchant_id) interval (1) subpartition by hash (accounting_date) SUBPARTITIONS 8 (partition p0 values less than (1)) ; insert into stats_test select rownum, MOD(rownum, 4), sysdate-(rownum/86400), 'blablabla' from (select rownum r from dual connect by rownum <= 1000) a, (select rownum r from dual connect by rownum <= 1000) b, (select rownum r from dual connect by rownum <= 1000) c where rownum <= 100000; create index stats_test_idx1 on stats_test (somedata) local; exec dbms_stats.set_table_prefs('JWARTAK', 'STATS_TEST', 'INCREMENTAL', 'TRUE'); -- just to be sure, it seems that ALL doesnt work with INCREMENTAL=TRUE -- so with this AUTO setting you end up with no subpartition stats -- you can tweak between AUTO and ALL here exec dbms_stats.set_table_prefs('JWARTAK', 'STATS_TEST', 'GRANULARITY', 'ALL'); -- just to be sure exec dbms_stats.set_table_prefs('JWARTAK', 'STATS_TEST', 'ESTIMATE_PERCENT', DBMS_STATS.AUTO_SAMPLE_SIZE); -- 10% by default but just to be sure select DBMS_STATS.GET_PREFS ('STALE_PERCENT','JWARTAK','STATS_TEST') stale_percent from dual; -- just in case exec dbms_stats.gather_table_stats('JWARTAK', 'STATS_TEST', cascade=>true); exec dbms_stats.gather_table_stats('JWARTAK', 'STATS_TEST', cascade=>true); -- prepare end ---experiement start , see STALE_STATS select PARTITION_NAME,SUBPARTITION_NAME,SUBPARTITION_POSITION, STALE_STATS, to_char(LAST_ANALYZED, 'HH24:MI DD/MM/YYYY') last_analyzed from user_tab_statistics where table_name='STATS_TEST' order by PARTITION_NAME, SUBPARTITION_POSITION; select count(*) from stats_test; select count(*) from stats_test where MERCHANT_ID=1; -- just add 1 row (so 25000 -> 25001 rows in partition; 1 row just adds one row in subpartition - so now 1 out of 8 subpartitions has 3126 rows ) -- 3125 rows -> 3126 rows in subpartition is enough trigger whole partition dbms_stats scan insert into stats_test (id, merchant_id, accounting_date, somedata) values (110012, 1, sysdate, 'blah'); commit; -- just in case exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO; -- at this point all STALE_STATS = null (subpartitions) or =NO (undetected change) select PARTITION_NAME,SUBPARTITION_NAME,SUBPARTITION_POSITION, STALE_STATS, to_char(LAST_ANALYZED, 'HH24:MI DD/MM/YYYY') last_analyzed from user_tab_statistics where table_name='STATS_TEST' order by PARTITION_NAME, SUBPARTITION_POSITION; -- 1st run, you will get some dirty results exec dbms_stats.gather_table_stats('JWARTAK', 'STATS_TEST'); -- another run to see the difference, exec dbms_stats.gather_table_stats('JWARTAK', 'STATS_TEST'); -- enable debug mode set serveroutput on --exec dbms_stats.set_global_prefs('trace',to_char(1+4+8+16+64+2048+4096+8192)); exec dbms_stats.set_global_prefs('trace',to_char(4+8+16+64+2048+4096+8192)); -- repeat insert (now 25001 -> 25002 row in single subpartition) insert into stats_test (id, merchant_id, accounting_date, somedata) values (110016, 1, sysdate, 'blah'); commit; -- you can comment this out, at least on 11.2.0.4.x there is no difference exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO; -- 2nd run, >>> REAL RESULT <<< exec dbms_stats.gather_table_stats('JWARTAK', 'STATS_TEST'); -- EXPERIMENT END
As you can see from the above there are two main variables here:
- dbms_stats granularity – you can set your own ALL, AUTO or any other to perform measurements
- trace level – based on http://www.pythian.com/blog/options-for-tracing-oracle-dbms_stats/ – the most interesting is where you want the output to go (diagnostic directory or via DBMS_OUTPUT buffer)
- FLUSH_DATABASE_MONITORING_INFO – you may want to flush it or not
The result is (at least on 11.2.0.4.x) that after adding just single row to single (sub)partition, even with INCREMENTAL_STATS=TRUE, you’ll get:
a) with GRANULARITY=AUTO (means table and partition but without dedicated subpartition stats, in most cases this will be “GLOBAL AND PARTITION”):
somewhere:~$ grep -i -e 'Started index ' -e stale -e 'need to be updated' -e 'gather index ' -e query /tmp/run | grep -v Ending DBMS_STATS: gather stats on partition SYS_P4061815: stats are stale; DBMS_STATS: Starting query at 15-JUL-15 01.40.21.747485000 PM +02:00 DBMS_STATS: Starting query at 15-JUL-15 01.40.21.790532000 PM +02:00 DBMS_STATS: Started index JWARTAK.SYS_C005852548 at 15-JUL-15 01.40.21.899589000 PM +02:00 granularity: GLOBAL AND PARTITION gIdxGran: DBMS_STATS: Starting query at 15-JUL-15 01.40.21.902620000 PM +02:00 DBMS_STATS: Started index JWARTAK.STATS_TEST_IDX1 at 15-JUL-15 01.40.21.974521000 PM +02:00 granularity: GLOBAL AND PARTITION gIdxGran: DBMS_STATS: gather index stats STATS_TEST_IDX1onSYS_P4061834: DBMS_STATS: Starting query at 15-JUL-15 01.40.21.978291000 PM +02:00 DBMS_STATS: Starting query at 15-JUL-15 01.40.22.000544000 PM +02:00 somewhere:~$
So this translates to:
- 2x sampled FULL “PARTITION” SCANs against single partition where new row has been added
- 1x sampled FULL INDEX SCAN on PK index
- 2x sampled FULL INDEX SCANs against changed LOCAL index partition where new row has been added
- certainly 1 new row out of ~25000 isn’t 10% change on partition
b) with GRANULARITY=ALL:
somewhere:~$ grep -i -e 'Started index ' -e stale -e 'need to be updated' -e 'gather index ' -e query /tmp/run_gALL | grep -v Ending DBMS_STATS: gather stats on subpartition SYS_SUBP4061878:stats are stale DBMS_STATS: Starting query at 15-JUL-15 01.46.57.166410000 PM +02:00 DBMS_STATS: Starting query at 15-JUL-15 01.46.57.180291000 PM +02:00 DBMS_STATS: gather stats on partition SYS_P4061886: stats are stale; DBMS_STATS: Starting query at 15-JUL-15 01.46.57.356143000 PM +02:00 DBMS_STATS: Starting query at 15-JUL-15 01.46.57.405171000 PM +02:00 DBMS_STATS: Started index JWARTAK.SYS_C005852574 at 15-JUL-15 01.46.57.522214000 PM +02:00 granularity: ALL gIdxGran: DBMS_STATS: Starting query at 15-JUL-15 01.46.57.524947000 PM +02:00 DBMS_STATS: Started index JWARTAK.STATS_TEST_IDX1 at 15-JUL-15 01.46.57.593190000 PM +02:00 granularity: ALL gIdxGran: DBMS_STATS: gather index statsSTATS_TEST_IDX1 on SYS_SUBP4061878:stats are stale DBMS_STATS: Starting query at 15-JUL-15 01.46.57.596544000 PM +02:00 DBMS_STATS: gather index stats STATS_TEST_IDX1onSYS_P4061905:stats are stale DBMS_STATS: Starting query at 15-JUL-15 01.46.57.601476000 PM +02:00 DBMS_STATS: Starting query at 15-JUL-15 01.46.57.622425000 PM +02:00 somewhere:~$
- 2x sampled FULL “SUBPARTITION” SCANs against single subpartition where new raw has been added
- 2x sampled FULL “PARTITION” SCANs against single partition where new raw has been added
- sampled FULL INDEX SCAN on PK index
- 1x sampled FULL INDEX SCAN against changed LOCAL index subpartition where new raw has been added
- 2x sampled FULL INDEX SCANs against changed LOCAL index partition where new raw has been added
- certainly 1 new row out of ~25000 isn’t 10% change on partition or subpartition here
SUMMARY: So with 1 single row added/deleted or just update, in some random subpartition, we can cause on 11gR2 an avalanche of I/O reads due to DBMS_STATS scanning unnecessary segments. When you start to think about table sizes that are above 1-2TB combined with weak I/O subsystem and rather “big” indexes (e.g. 100-200GB), you can quickly run out of MAINTENANCE_WINDOW schedule due to this overhead (in extreme it could be several hundredths GB scanned via DBMS_STATS due to 1 byte changed ).
Some other interesting results:
- as was demonstrated STALE_PERCENT does not work against (sub)partitions in 11gR2
- since 12gR1 you have an access to INCREMENTAL_STATS + INCREMENTAL_STALENESS (new feature) combo, where you can just set INCREMENTAL_STALENESS to USE_STALE_PERCENT and trigger stats collection for (sub)partitions only after certain threshold of changed rows is reached
- apparently and contrary to documentation on 11.2.0.4.x series, INCREMENTAL_STATS=TRUE works with GRANULARITY=ALL (at least in this test)
- STALE_STATS column from {user|all|dba}_{tab|ind}_statistics is useless for partitioned tables (flushing does not help anyway), DBMS_STATS GATHER LIST is much better for this purpose
- there is interesting granularity option APPROX_GLOBAL AND PARTITION but I’m not sure is it 10.2, 11.1 history artifact as i did nott have enough time to test it in details