Archive for January, 2017

I/O latency on REDO disks during storage array controller failover

Wednesday, January 18th, 2017

I’ve managed to create an interesting I/O latency graph (from iostat atime) in GNU R during an planned test of storage controller array failure in 2-way configuration (sorry I cannot provide vendor/model). In most 2-way storage arrays write back cache is enabled (and sync I/O from LGWR are really buffered) until the 2nd controller crashes which disables this write-back cache. This increased latency may have influence on overall platform stability , especially with Oracle DBs doing thousands of IO/s (e.g. commits or other activity). The interesting thing about this I/O latency are freezes (up to ~500 .. 900ms) caused by the failed/rebooted controller entering and leaving the array processing:

Of course result might vary depending on many factors (storagee array vendor, OS used, multipathing configuration, etc)

How one can to overcome:

  • use 4-way controllers storage arrays (crash of one 1 out of 4 controllers might not disable write back cache – depends on vendor)
  • provide required IOPS on RAID groups with the assumption of write cache hit ratio=0% (e.g. more disks in RAID10, different RAID level, SSDs, Fusion IO, flash arrays) – I think it is interesting voice in discussion should we use SSD or not in storage arrays for Oracle’s REDO