Archive for May, 2010

Testing NetApp’s SyncMirror whole plex failure (on simulator)

Thursday, May 20th, 2010

Together with Lukasz Borek we’ve tested the SyncMirror reliability under some stress…. In short basic NetApp deployment just uses RAID-DP (Double Parity, so you are might 2 disks fail before you loose data). If your data is critical, they you can actually configure SyncMirror which just mirrors aggregate (on top of which are volumes) to two plexes, each consisting of RAID group(s). This behaves like RAID-1 which is using two RAID-DP (you have double ammount of disks protecting you). Simple demonstration (just before this we’ve removed all spare disks for demonstration purposes)

So we have our aggregate aggr_mirr:

filerA> aggr status -r
Aggregate aggr_mirr (online, raid_dp, mirrored) (block checksums)
  Plex /aggr_mirr/plex0 (online, normal, active, pool0)
    RAID group /aggr_mirr/plex0/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   v5.19   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
      parity    v5.20   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
      data      v5.24   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448

  Plex /aggr_mirr/plex1 (online, normal, active, pool0)
    RAID group /aggr_mirr/plex1/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   v5.25   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
      parity    v5.26   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
      data      v5.27   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448

[..]

Pool1 spare disks (empty)

Pool0 spare disks (empty)

Broken disks

RAID Disk       Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------  ------------- ---- ---- ---- ----- --------------    --------------
admin removed   v4.16   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v4.17   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v4.20   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v4.21   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v4.22   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v4.24   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v4.25   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v4.26   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v4.27   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v4.28   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v4.29   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v4.32   v4    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v5.28   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v5.29   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
admin removed   v5.32   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448

We want to fail all of the disks (5.19, 5.20, 5.24) that are part of /aggr_mirr/plex0/rg0 (that simulates loosing whole plex, we wanted to test that aggr_mirr is going to be able survive crash of one of the plexes):

filerA> disk fail v5.19
*** You are about to prefail the following file system disk, ***
*** which will eventually result in it being failed ***
  Disk /aggr_mirr/plex0/rg0/v5.19

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   v5.19   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
***
Really prefail disk v5.19?  y

WARNING! There is no spare disk available to which to copy.
Are you sure you want to continue with disk fail (y/n)? y
disk fail: The following disk was prefailed: v5.19
Disk v5.19 has been prefailed.  Its contents will be copied to a
replacement disk, and the prefailed disk will be failed out.
filerA>
Tue Apr 27 18:03:52 GMT [raid.rg.diskcopy.cant.start:warning]: /aggr_mirr/plex0/rg0: unable to start disk copy for v5.19: No block checksum disk of required type and size is available, targeting Pool0
filerA>
filerA> disk fail v5.20
*** You are about to prefail the following file system disk, ***
*** which will eventually result in it being failed ***
  Disk /aggr_mirr/plex0/rg0/v5.20

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      parity    v5.20   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
***
Really prefail disk v5.20? yes

WARNING! There is no spare disk available to which to copy.
Are you sure you want to continue with disk fail (y/n)? y
disk fail: The following disk was prefailed: v5.20
Disk v5.20 has been prefailed.  Its contents will be copied to a
replacement disk, and the prefailed disk will be failed out.
filerA>
filerA>
filerA> disk fail v5.24
*** You are about to prefail the following file system disk, ***
*** which will eventually result in it being failed ***
  Disk /aggr_mirr/plex0/rg0/v5.24

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      data      v5.24   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
***
Really prefail disk v5.24? y

WARNING! There is no spare disk available to which to copy.
Are you sure you want to continue with disk fail (y/n)? y
disk fail: The following disk was prefailed: v5.24
Disk v5.24 has been prefailed.  Its contents will be copied to a
replacement disk, and the prefailed disk will be failed out.
filerA>
filerA> disk simpull  v5.19
Tue Apr 27 18:06:08 GMT [raid.config.filesystem.disk.missing:info]: File system Disk /aggr_mirr/plex0/rg0/v5.19 Shelf ? Bay ? [NETAPP   VD-1000MB-FZ-520 0042] S/N [16402503] is missing.
filerA>
Tue Apr 27 18:06:08 GMT [raid.rg.recons.missing:notice]: RAID group /aggr_mirr/plex0/rg0 is missing 1 disk(s).
Tue Apr 27 18:06:08 GMT [raid.rg.recons.cantStart:warning]: The reconstruction cannot start in RAID group /aggr_mirr/plex0/rg0: No block checksum disk of required type and size is available, targeting Pool1
filerA>
filerA> disk simpull  v5.20
Tue Apr 27 18:06:16 GMT [raid.disk.missing:info]: Disk /aggr_mirr/plex0/rg0/v5.20 Shelf ? Bay ? [NETAPP   VD-1000MB-FZ-520 0042] S/N [16402504] is missing from the system
Tue Apr 27 18:06:16 GMT [raid.config.filesystem.disk.missing:info]: File system Disk /aggr_mirr/plex0/rg0/v5.20 Shelf ? Bay ? [NETAPP   VD-1000MB-FZ-520 0042] S/N [16402504] is missing.
filerA>
Tue Apr 27 18:06:17 GMT [raid.rg.recons.missing:notice]: RAID group /aggr_mirr/plex0/rg0 is missing 2 disk(s).
Tue Apr 27 18:06:17 GMT [raid.rg.recons.cantStart:warning]: The reconstruction cannot start in RAID group /aggr_mirr/plex0/rg0: No block checksum disk of required type and size is available, targeting Pool1
filerA>
filerA> disk simpull  v5.24
Tue Apr 27 18:06:20 GMT [raid.config.filesystem.disk.missing:info]: File system Disk /aggr_mirr/plex0/rg0/v5.24 Shelf ? Bay ? [NETAPP   VD-1000MB-FZ-520 0042] S/N [16402507] is missing.
Tue Apr 27 18:06:20 GMT [raid.vol.mirror.degraded:error]: Aggregate aggr_mirr is mirrored and one plex has failed. It is no longer protected by mirroring.

So we’ve destroyed whole plex! Great success! ;) In real world this would be more like whole shelf failure. NetApp controller of course tried also to call home, because this is pretty serious:

Tue Apr 27 18:06:20 GMT [callhome.syncm.plex:CRITICAL]: Call home for SYNCMIRROR PLEX FAILED

Let’s verify again:

filerA> aggr status -r
Aggregate aggr_mirr (online, raid_dp, mirror degraded) (block checksums)
  Plex /aggr_mirr/plex0 (offline, failed, inactive, pool0)
    RAID group /aggr_mirr/plex0/rg0 (partial)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   FAILED          N/A                        1020/2089984
      parity    FAILED          N/A                        1020/2089984
      data      FAILED          N/A                        1020/2089984
      Raid group is missing 3 disks.

  Plex /aggr_mirr/plex1 (online, normal, active, pool0)
    RAID group /aggr_mirr/plex1/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   v1.25   v1    ?   ?   FC:A   0  FCAL  N/A  1020/2089984      1027/2104448
      parity    v5.26   v5    ?   ?   FC:B   0  FCAL  N/A  1020/2089984      1027/2104448
      data      v1.27   v1    ?   ?   FC:A   0  FCAL  N/A  1020/2089984      1027/2104448

[..]

During the whole PLEX destruction process, we were continually writing to NFS volume on this aggr_mirr, in synchronus mode (to avoid caching at OS level and to always hit the [NV]RAM – of course no NVRAM in simulator….), as you can see from below output, no single I/O error at all from NFS client perspective:

2097152 bytes (2.1 MB) copied, 0.498529 seconds, 4.2 MB/s
Thu Apr 29 16:10:32 CEST 2010
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.51502 seconds, 4.1 MB/s
Thu Apr 29 16:10:33 CEST 2010
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.514629 seconds, 4.1 MB/s
Thu Apr 29 16:10:35 CEST 2010
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.419478 seconds, 5.0 MB/s
Thu Apr 29 16:10:36 CEST 2010
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.411578 seconds, 5.1 MB/s
Thu Apr 29 16:10:38 CEST 2010
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.444393 seconds, 4.7 MB/s
Thu Apr 29 16:10:39 CEST 2010
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.43042 seconds, 4.9 MB/s
Thu Apr 29 16:10:41 CEST 2010
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.417388 seconds, 5.0 MB/s
Thu Apr 29 16:10:42 CEST 2010
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.504876 seconds, 4.2 MB/s
Thu Apr 29 16:10:44 CEST 2010
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.425559 seconds, 4.9 MB/s
Thu Apr 29 16:10:45 CEST 2010
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.453916 seconds, 4.6 MB/s
Thu Apr 29 16:10:46 CEST 2010
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.417439 seconds, 5.0 MB/s