I/O latency on REDO disks during storage array controller failover

I’ve managed to create an interesting I/O latency graph (from iostat atime) in GNU R during an planned test of storage controller array failure in 2-way configuration (sorry I cannot provide vendor/model). In most 2-way storage arrays write back cache is enabled (and sync I/O from LGWR are really buffered) until the 2nd controller crashes which disables this write-back cache. This increased latency may have influence on overall platform stability , especially with Oracle DBs doing thousands of IO/s (e.g. commits or other activity). The interesting thing about this I/O latency are freezes (up to ~500 .. 900ms) caused by the failed/rebooted controller entering and leaving the array processing:

Of course result might vary depending on many factors (storagee array vendor, OS used, multipathing configuration, etc)

How one can to overcome:

  • use 4-way controllers storage arrays (crash of one 1 out of 4 controllers might not disable write back cache – depends on vendor)
  • provide required IOPS on RAID groups with the assumption of write cache hit ratio=0% (e.g. more disks in RAID10, different RAID level, SSDs, Fusion IO, flash arrays) – I think it is interesting voice in discussion should we use SSD or not in storage arrays for Oracle’s REDO


  1. https://itpeernetwork.intel.com/should-you-put-oracle-database-redo-on-solid-state-disks-ssds/
  2. https://flashdba.com/2013/08/22/storage-myths-put-oracle-redo-on-ssd/
  3. https://www.pythian.com/blog/de-confusing-ssd-for-oracle-databases/

Comments are closed.