WAFL performance VS sequential reads: part II, FC LUN defragmentation

After partI – where i’ve been simulating typical Oracle workload (generating 70:30 read to write ratio on FC LUN) and creating snapshots – i’ve wanted to try different performance tests. In order to achieve the same performance characteristics, so i’ve deleted all my snapshots, so my FlexVol ended up again in 40% utilization:

X> snap list full_aggr_test
Volume full_aggr_test
working...

No snapshots exist.
X>
X> df -g full_aggr_test
Filesystem               total       used      avail capacity  Mounted on
/vol/full_aggr_test/       50GB       20GB       29GB      40%  /vol/full_aggr_test/
/vol/full_aggr_test/.snapshot        0GB        0GB        0GB     ---%  /vol/full_aggr_test/.snapshot
X>

Later i’ve executed Orion stress test, in a identical way like in partI on the same enviorniment. As you can see still the LUN is fragmented because any kind of sequential read is going to be impacted (maximum read observed ~17MB/s):

root@Y:# grep Maximum orion*
orion_20110627_1116_summary.txt:Maximum Large MBPS=17.07 @ Small=0 and Large=9
orion_20110627_1116_summary.txt:Maximum Small IOPS=683 @ Small=24 and Large=0
root@Y:#

So in order to fight with this performance issue one can establish the root cause:

X> reallocate measure /vol/full_aggr_test
Reallocation scan will be started on '/vol/full_aggr_test'.
Monitor the system log for results.
X>

System log will reveal this:

Mon Jun 27 07:35:31 EDT [X: wafl.scan.start:info]: Starting WAFL layout measurement on volume full_aggr_test.
Mon Jun 27 07:35:32 EDT [X: wafl.reallocate.check.highAdvise:info]: Allocation check on '/vol/full_aggr_test' is 8, hotspot 0 (threshold 4), consider running reallocate.

This seems to be identical to running measure on the LUN:

X> reallocate measure  /vol/full_aggr_test/lun01
Reallocation scan will be started on '/vol/full_aggr_test/lun01'.
Monitor the system log for results.
X>

Log will show this:

Mon Jun 27 07:45:21 EDT [X: wafl.scan.start:info]: Starting WAFL layout measurement on volume full_aggr_test.
Mon Jun 27 07:45:21 EDT [X: wafl.reallocate.check.highAdvise:info]: Allocation check on '/vol/full_aggr_test/lun01' is 8, hotspot 0 (threshold 4), consider running reallocate.

So in both cases we were recommended to defragment the LUN, but keep in mind that this is a rather resource hungry operation, as it might involve reading and rewriting the full contents of the data!

X> reallocate start -f -p /vol/full_aggr_test/lun01
Reallocation scan will be started on '/vol/full_aggr_test/lun01'.
Monitor the system log for results.
X>

Log will show that the operation has started …

Mon Jun 27 07:46:23 EDT [X: wafl.br.revert.slow:info]: The aggregate 'sm_aggr1' contains blocks that require redirection; 'revert_to' might take longer than expected.
Mon Jun 27 07:46:23 EDT [X: wafl.scan.start:info]: Starting file reallocating on volume full_aggr_test.

As you can see it is rather low CPU activity however , physical utilization of the disks is reported as high (don’t be fooled by low write activity – this is function of time, it does perform a lot of writes later):

 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out
 10%     0     0     0     157     0     0  22372  19320     0     0    53s  94%  58%  :   97%    156     0   589   175     0     0
 10%     1     0     0     108     0     0  24884      0     0     0    53s  94%   0%  -   92%    106     0   256   585     0     0
  9%     0     0     0     101     0     0  25284     24     0     0    53s  94%   0%  -   93%    100     0   421   260     0     0
 12%     0     0     0     627    20    25  25620      8     0     0    53s  94%   0%  -   92%    511     0   297   132     0     0
 11%     0     0     0     792     0     0  22832      0     0     0    53s  94%   0%  -   90%    652     0   670   461     0     0
  6%     1     0     0      81     1     1  25232     24     0     0    53s  99%   0%  -   92%     78     0   233   253     0     0

One can monitor the progress by using “status” command and in fact observe

X> reallocate status -v /vol/full_aggr_test/lun01
Reallocation scans are on
/vol/full_aggr_test/lun01:
        State: Reallocating: Block 1347456 of 5242880 (25%), updated 1346434
        Flags: doing_force,measure_only,repeat,keep_vvbn
    Threshold: 4
     Schedule: n/a
     Interval: 1 day
 Optimization: 8
  Measure Log: n/a
X>
[..]
X> reallocate status -v /vol/full_aggr_test/lun01
Reallocation scans are on
/vol/full_aggr_test/lun01:
        State: Idle
        Flags: measure_only,repeat
    Threshold: 4
     Schedule: n/a
     Interval: 1 day
 Optimization: 8
  Measure Log: n/a

X> sysstat -x 1
 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out
 53%     1     0     0     678     1     1  29428   1556     0     0     1   72%   9%  :   11%    573     0   311 21077     0     0
 34%     0     0     0     443     0     0  22028     32     0     0     1   78%   0%  -    5%    442     0  1068 20121     0     0
 40%     0     0     0     172     0     0  16360      0     0     0     1   77%   0%  -    4%    171     0   367 14450     0     0
 CTRL+C
X>

Later results indicate that indeed sequential reads are back to their top value (~42MB/s) and this was our starting point on fresh FlexVol inside LUN in partI…

root@Y:# grep Maximum orion*
orion_20110627_1208_summary.txt:Maximum Large MBPS=42.73 @ Small=0 and Large=9
orion_20110627_1208_summary.txt:Maximum Small IOPS=645 @ Small=25 and Large=0
root@Y:#

In the next series i’ll try to investiage the various AIX JFS2/CIO behaviours and to some degree the performance characteristics of Netapp storage and it’s options (e.g. read_realloc option). Stay tuned…

2 Responses to “WAFL performance VS sequential reads: part II, FC LUN defragmentation”

  1. [...] my series about LUN fragmentation in WALF (part1, part2) I wanted to give a try to read_realloc option. Mostly the same storage system (still DataOnTap [...]

  2. Chris Madden says:

    If the argument to “reallocate measure” is a volume the result is a cumulative result for all files/luns in the volume whereas if the argument is a file/lun than the result is just for that specific file/lun. For this reason in your example both should have the same output because there is only a single file/lun in the volume.