I’ve managed to create an interesting I/O latency graph (from iostat atime) in GNU R during an planned test of storage controller array failure in 2-way configuration (sorry I cannot provide vendor/model). In most 2-way storage arrays write back cache is enabled (and sync I/O from LGWR are really buffered) until the 2nd controller crashes which disables this write-back cache. This increased latency may have influence on overall platform stability , especially with Oracle DBs doing thousands of IO/s (e.g. commits or other activity). The interesting thing about this I/O latency are freezes (up to ~500 .. 900ms) caused by the failed/rebooted controller entering and leaving the array processing:
Of course result might vary depending on many factors (storagee array vendor, OS used, multipathing configuration, etc)
How one can to overcome:
- use 4-way controllers storage arrays (crash of one 1 out of 4 controllers might not disable write back cache – depends on vendor)
- provide required IOPS on RAID groups with the assumption of write cache hit ratio=0% (e.g. more disks in RAID10, different RAID level, SSDs, Fusion IO, flash arrays) – I think it is interesting voice in discussion should we use SSD or not in storage arrays for Oracle’s REDO
Ref: