WAFL performance VS sequential reads: part I, FC LUN performance from AIX vs FlexVol utilization

Some time ago i’ve started thinking about getting more serious about performance research on AIX6.1, PowerHA and Oracle stack on top of Netapp storage. One of the first things that i wanted to measure was how the Netapp’s WALF handles FlexVolume utilization in correlation to space usage. In theory long sequential reads (as like in Oracle Datawarehouses in which i’m interested) could be affected because of the fragmentation introduced by WAFL (Write *Anywhere* File System).

Some specs first:

  • DataONTap version 7.3.2, single Netapp controller 3160 was tested (but in cluster).
  • The test was done using Oracle’s Orion 11.1.0.7 storage benchmarking tool on top of AIX (orion_aix_ppc64 -run advanced -num_disks 5 -cache_size 2048 -write 30 -duration 20 -type rand -matrix basic) – as you can see the read vs write ratio was 70% to 30%, but only long reads were presented (i was not interested in the performance of 8kB reads/writes, just 1MB long reads)
  • AIX was connected via VFCs to two separate VIOS, which each were connected using 2 FC 8 GBps links each (but running in 4 GBps mode due to the Brocade SAN switches not supporting 8 GBps)
  • Netapp controller was having 4 GBps ports
  • AIX was using 6.1′s internal MPIO (round-robin) for the tested LUN
  • AIX hdisk for LUN was set to the default value of 12 (as per Netapp Host Attachement Kit)
  • AIX JFS2 filesystem was mounted with Concurrent I/O to prevent AIX from caching and read-aheads (still AIX had 3GB of RAM allocated but VMM should not use it)
  • Netapp storage controller was having 4 processors, 8GB RAM and 2GB for NVRAM (as indicated by sysconfig output, of course as this is cluster, only 1GB was available)
  • LUN size was 20GB, on top of 50GB FlexVol on RAID-DP aggregate with 5x FC 15k RPM disks
X> vol options full_aggr_test
nosnap=off, nosnapdir=off, minra=off, no_atime_update=off, nvfail=off,
ignore_inconsistent=off, snapmirrored=off, create_ucode=on,
convert_ucode=off, maxdirsize=83804, schedsnapname=ordinal,
fs_size_fixed=off, compression=off, guarantee=volume, svo_enable=off,
svo_checksum=off, svo_allow_rman=off, svo_reject_errors=off,
no_i2p=off, fractional_reserve=100, extent=off, try_first=volume_grow,
read_realloc=off, snapshot_clone_dependency=off
X> df -Ag aggr_used
Aggregate                total       used      avail capacity
aggr_used               1102GB      668GB      433GB      61%
aggr_used/.snapshot         0GB        0GB        0GB     ---%
X>
X> snap list -A aggr_used
Aggregate aggr_used
working...

No snapshots exist.
X> snap sched -A aggr_used
Aggregate aggr_used: 0 0 0
X>

WALF aggregate was idle during that test and configured and running as follows:

Aggregate aggr_used (online, raid_dp) (block checksums)
  Plex /aggr_used/plex0 (online, normal, active)
    RAID group /aggr_used/plex0/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   2d.18   2d    1   2   FC:B   -  FCAL 15000 418000/856064000  420156/860480768
      parity    1d.19   1d    1   3   FC:A   -  FCAL 15000 418000/856064000  420156/860480768
      data      2d.21   2d    1   5   FC:B   -  FCAL 15000 418000/856064000  420156/860480768
      data      1d.22   1d    1   6   FC:A   -  FCAL 15000 418000/856064000  420156/860480768
      data      2d.23   2d    1   7   FC:B   -  FCAL 15000 418000/856064000  420156/860480768

[..]

X> aggr status aggr_used -v
           Aggr State           Status            Options
       aggr_used online          raid_dp, aggr     nosnap=off, raidtype=raid_dp,
                                                  raidsize=16,
                                                  ignore_inconsistent=off,
                                                  snapmirrored=off,
                                                  resyncsnaptime=60,
                                                  fs_size_fixed=off,
                                                  snapshot_autodelete=on,
                                                  lost_write_protect=on
[..]

X> df -g full_aggr_test
Filesystem               total       used      avail capacity  Mounted on
/vol/full_aggr_test/       50GB       20GB       29GB      40%  /vol/full_aggr_test/
/vol/full_aggr_test/.snapshot        0GB        0GB        0GB     ---%  /vol/full_aggr_test/.snapshot
X>
X> snap list full_aggr_test
Volume full_aggr_test
working...

No snapshots exist.
X>

From AIX point of view, filesystem was configured as follows (notice those big files for Orion’s use):

root@Y:# df -m .
Filesystem    MB blocks      Free %Used    Iused %Iused Mounted on
/dev/fslv00    19456.00   2099.70   90%        7     1% /fullaggr
root@Y:# du -sm *
10000.01        bigfile
7353.00 bigfile2
0.00    lost+found
0.00    orion.lun
root@Y:#

Results:

Methodology:
For each next attempt a snapshot was created to grow the space used inside the FlexVol until it was full (40%..100%). There was single Orion execution after each snapshot was created.The Y-axis represents maximum bandwidth observed for sequential 1MB reads (as reported by Orion). The Z-axis (depth) ranging from 1 to 10 reprents the number of concurrent/paralell reads being done by Orion (to let’s say simulate multiple full table scans happening on the same LUN). As it is visible from the graph when FlexVol utilization is close or equal to 100% a nearly more than double performance impact can be observed (40-45 MB/s vs 10-15MB/s). The sane FlexVol utilization minimium seems to somewhere max at 70% to avoid problems with fragmentation. AIX system was mostly coming with default settings without any more advanced optimizations (that was done on purpose except Concurrent I/O).

5 Responses to “WAFL performance VS sequential reads: part I, FC LUN performance from AIX vs FlexVol utilization”

  1. [...] partI – where i’ve been simulating typical Oracle workload (generating 70:30 read to write [...]

  2. [...] my series about LUN fragmentation in WALF (part1, part2) I wanted to give a try to read_realloc option. Mostly the same storage system (still [...]

  3. Alex says:

    Hi.. Need an answer for this question .. IF NetApp wafl is designed to write in a new place wen the data on-write operation hapns .. since it dosnt overwrite the block to store the new data and allways chooses to write it on a new block .. My question is if it allways chooses to write it on a new block what is the limit of It .. hw far can he allways choose a new block ..wts the LIMIT ?

  4. admin says:

    Alex, it appears to be only limited with free blocks (e.g. if the blocks are not referenced/used by snapshots inside the FlexVol and/or there is unused space inside the Vol). You could potentially learn a bit more about internals by reading docs like this one http://www.eecs.harvard.edu/~pmacko/papers/backref-fast10.pdf or that one web.cs.wpi.edu/~claypool/courses/…/HLM02.ppt

  5. Does your website have a contact page? I’m having
    a tough time locating it but, I’d like to shoot you an email.
    I’ve got some ideas for your blog you might be interested in hearing.
    Either way, great website and I look forward to seeing it develop over time.