Archive for January, 2011

PowerHA failure scenario when dealing with SAN-booted LPARs – part II

Monday, January 24th, 2011

Ok, this is part II of the post PowerHA failure scenario when dealing with SAN-booted LPARs – part I.

The first scenario we have performed was to disable 100% of the MPIO storage paths to the active HACMP node by un-mapping Virtual FibreChannel adapters (vadapters from both VIOS protecting the active node [LPAR]). On both VIOS we have performed the following command:

$ vfcmap -vadapter vfchostXYZ -fcp

where the vfchostXYZ was server-side (or better VIOS side) vadapter handling FC I/O traffic for this LPAR. The result? The LPAR with HACMP/PowerHA active Resource Groups on it (jkwha001d) after some time evicted itself from the HACMP cluster, and the old passive node jkwha002d became the active one (Oracle started there). The root cause of the jkwha001d LPAR is the following one:

root@jkwha001d : /home/root :# errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
AFA89905   0103071611 I O grpsvcs        Group Services daemon started
97419D60   0103071611 I O topsvcs        Topology Services daemon started
A6DF45AA   0103071111 I O RMCdaemon      The daemon is started.
67145A39   0103070911 U S SYSDUMP        SYSTEM DUMP
F48137AC   0103070911 U O minidump       COMPRESSED MINIMAL DUMP
DC73C03A   0103070911 T S fscsi0         SOFTWARE PROGRAM ERROR
9DBCFDEE   0103070911 T O errdemon       ERROR LOGGING TURNED ON
26623394   0103070511 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070511 T S fscsi1         SOFTWARE PROGRAM ERROR
26623394   0103070511 T H fscsi1         COMMUNICATION PROTOCOL ERROR
(..)
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
DE3B8540   0103070411 P H hdisk4         PATH HAS FAILED
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
DE3B8540   0103070411 P H hdisk8         PATH HAS FAILED
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
DE3B8540   0103070411 P H hdisk6         PATH HAS FAILED
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
DE3B8540   0103070411 P H hdisk3         PATH HAS FAILED
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
DE3B8540   0103070411 P H hdisk2         PATH HAS FAILED
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
DE3B8540   0103070411 P H hdisk0         PATH HAS FAILED
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DE3B8540   0103070411 P H hdisk5         PATH HAS FAILED
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
26623394   0103070411 T H fscsi1         COMMUNICATION PROTOCOL ERROR
DC73C03A   0103070411 T S fscsi1         SOFTWARE PROGRAM ERROR
(..)
A39F8A49   1210235210 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
D5676F6F   1210235210 T H fscsi1         ATTACHED SCSI TARGET DEVICE ERROR
DE3B8540   1210235210 P H hdisk6         PATH HAS FAILED
D5676F6F   1210235210 T H fscsi0         ATTACHED SCSI TARGET DEVICE ERROR
DE3B8540   1210235210 P H hdisk1         PATH HAS FAILED
D5676F6F   1210235210 T H fscsi1         ATTACHED SCSI TARGET DEVICE ERROR
(..)

As you can see AIX6.1 generated a “SYSTEM DUMP” which is actually a system panic indicator. Normally AIX saves the memory image of it’s kernel on runtime to the Logical Volumes configured with “sysdump” type and then reboots. This allows investigation why the machine crashed, but here you won’t see anything. You’ve lost FC storage connectivity (even rootvg) , so it even couldn’t save it . So from where it knows it? Probably the state of the LPAR crash can be saved somewhere in POWERVM firmware area. It’s one of the RAS things in AIX/POWER. So far, so good, the HACMP/PowerHA was able to recover services…

OK, but we wanted to have a real proof, so we have performed a double-check. We suspected that VIOS can have some magic way to communicate with LPAR. We wanted exclude that factor, and we’ve performed simulation of disconnecting the FC from storage level. The initital state was stable HACMP cluster, with RG active on jkwha002d (and jkwha001d being passive), all MPIO paths to Netapp cluster (storage1, storage2) were reported as “Enabled” by “lspath”. The igroup term in Netapp concept is “inititator group” and is responsible for things like LUN masking. If you remove access to some FC WWPNs on the LUN, it is going to end-like in situation in which AIX will have hdisks point to non-existing SCSI adapters and AIX SCSI stack will get a lot of errors (FC is just wrapper around SCSI).

On the 2nd storage controller (storage2) the igroup jkwha002d_boot was controlling access to the OS level LUNs (rootvg, etc):

storage2> lun show -g jkwha002d_boot
        /vol/jkwha002d_boot/boot/rootvg   40.0g (42953867264)   (r/w, online, mapped)
storage2>

Also on 2nd storage there was an igroup responsbile for SAP and Oracle LUNs:

storage2> lun show -g jkwha002d_cluster
        /vol/sap_jw1_mirrlog/mirrlog/mirrlog01.lun      1g (1073741824)    (r/w, online, mapped)
        /vol/sap_jw1_oraarch/oraarch/oraarch01.lun     51g (54760833024)   (r/w, online, mapped)
        /vol/sap_jw1_sapdata/sapdata2/sapdata02.lun     15g (16106127360)   (r/w, online, mapped)
storage2>

The same igroup was present on 1st storage system, controlling the remaining LUNs (yes, this is active-active configuration, where reads/writes are being managed by AIX LVM/VG consisting on LUNs on two controllers):

storage1> lun show -g jkwha002d_cluster
        /vol/clus_001d_002d_hb/hb/hbvg     20m (20971520)      (r/w, online, mapped)
        /vol/sap_jw1_origlog/origlog/origlog01.lun      1g (1073741824)    (r/w, online, mapped)
        /vol/sap_jw1_sapbin/sapbin/sapbin01.lun    100g (107374182400)  (r/w, online, mapped)
        /vol/sap_jw1_sapdata/sapdata1/sapdata01.lun     15g (16106127360)   (r/w, online, mapped)
storage1>

Let’s do it:

storage1> igroup remove
usage:
igroup remove [ -f ] <initiator_group> <node> ...
  - removes node(s) from an initiator group. The node may also
    specified as its alias. See "fcp wwpn-alias" for help with aliases
    By default, if lun maps exist for the initiator group, the node is
    not removed from the group. The -f option can be used to force the
    removal.
For more information, try 'man na_igroup'
storage1>
storage1> igroup remove -f jkwha002d_cluster c0:50:76:00:61:20:00:de
storage1> igroup remove -f jkwha002d_cluster c0:50:76:00:61:20:00:e0
storage2> igroup remove -f jkwha002d_cluster c0:50:76:00:61:20:00:de
storage2> igroup remove -f jkwha002d_cluster c0:50:76:00:61:20:00:e0
storage2> igroup remove -f jkwha002d_boot c0:50:76:00:61:20:00:de
storage2> igroup remove -f jkwha002d_boot c0:50:76:00:61:20:00:e0

.. and HACMP failover won’t work. The active LPAR node (jkwha002d) is going end up in zoombie state. If you have opened SSH sessions before, everything would indicate that is working, ls on /etc, some commands utils too. But the reason is because some things are cached in AIX’s filesystem cache. Everything is going to fail with I/O errors, lspath will cry too (it won’t display all MPIO paths as failed but this is story for another post), examples:

root@jkwha002d:# topas
bash: /usr/bin/topas: There is an input or output error.
root@jkwha002d:# nmon
bash: /usr/bin/nmon: There is an input or output error.
root@jkwha002d:# lspath
Failed  hdisk0 fscsi1
Failed  hdisk1 fscsi1
Enabled hdisk0 fscsi1
Failed  hdisk1 fscsi1
Failed  hdisk0 fscsi0
Failed  hdisk1 fscsi0
Failed  hdisk0 fscsi0
Enabled hdisk1 fscsi0
Failed  hdisk2 fscsi0
Failed  hdisk2 fscsi0
Failed  hdisk2 fscsi0
Failed  hdisk2 fscsi0
Enabled hdisk2 fscsi1
Failed  hdisk2 fscsi1
Failed  hdisk2 fscsi1
Failed  hdisk2 fscsi1
Failed  hdisk3 fscsi0
Failed  hdisk4 fscsi0
Failed  hdisk5 fscsi0
Failed  hdisk6 fscsi0
Failed  hdisk7 fscsi0
Failed  hdisk8 fscsi0
Failed  hdisk3 fscsi0
Failed  hdisk4 fscsi0
Failed  hdisk5 fscsi0
Failed  hdisk6 fscsi0
Failed  hdisk7 fscsi0
Failed  hdisk8 fscsi0
Failed  hdisk3 fscsi0
Failed  hdisk4 fscsi0
Failed  hdisk5 fscsi0
Failed  hdisk6 fscsi0
Failed  hdisk7 fscsi0
Failed  hdisk8 fscsi0
Failed  hdisk3 fscsi0
Failed  hdisk4 fscsi0
Failed  hdisk5 fscsi0
Failed  hdisk6 fscsi0
Failed  hdisk7 fscsi0
Failed  hdisk8 fscsi0
Failed  hdisk3 fscsi1
Failed  hdisk4 fscsi1
Enabled hdisk5 fscsi1
Enabled hdisk6 fscsi1
Failed  hdisk7 fscsi1
Failed  hdisk8 fscsi1
Enabled hdisk3 fscsi1
Enabled hdisk4 fscsi1
Failed  hdisk5 fscsi1
Failed  hdisk6 fscsi1
Failed  hdisk7 fscsi1
Failed  hdisk8 fscsi1
Failed  hdisk3 fscsi1
Failed  hdisk4 fscsi1
Failed  hdisk5 fscsi1
Failed  hdisk6 fscsi1
Enabled hdisk7 fscsi1
Failed  hdisk8 fscsi1
Failed  hdisk3 fscsi1
Failed  hdisk4 fscsi1
Failed  hdisk5 fscsi1
Failed  hdisk6 fscsi1
Failed  hdisk7 fscsi1
Enabled hdisk8 fscsi1
Missing hdisk9 fscsi0
Missing hdisk9 fscsi0
Missing hdisk9 fscsi1
Missing hdisk9 fscsi1
root@jkwha002d:# errpt | more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
E86653C3   0103074511 P H LVDD           I/O ERROR DETECTED BY LVM
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
E86653C3   0103074511 P H LVDD           I/O ERROR DETECTED BY LVM
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
E86653C3   0103074511 P H LVDD           I/O ERROR DETECTED BY LVM
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
B6267342   0103074511 P H hdisk0         DISK OPERATION ERROR
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
E86653C3   0103074511 P H LVDD           I/O ERROR DETECTED BY LVM
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
E86653C3   0103074511 P H LVDD           I/O ERROR DETECTED BY LVM
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
B6267342   0103074511 P H hdisk0         DISK OPERATION ERROR
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
EA88F829   0103074511 I O SYSJ2          USER DATA I/O ERROR
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
E86653C3   0103074511 P H LVDD           I/O ERROR DETECTED BY LVM
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
00B984B3   0103074511 U H hdisk0         UNDETERMINED ERROR
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
B6267342   0103074511 P H hdisk0         DISK OPERATION ERROR
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW
E86653C3   0103074511 P H LVDD           I/O ERROR DETECTED BY LVM
A39F8A49   0103074511 T S syserrlg       ERROR LOGGING BUFFER OVERFLOW

Oracle is going to panic (which is good behaviour in my opinion):

root@jkwha002d:# tail alert_JW1.log
Mon Jan  3 07:39:21 2011
Errors in file /oracle/JW1/saptrace/background/jw1_lgwr_831500.trc:
ORA-00345: redo log write error block 7 count 1
ORA-00312: online log 13 thread 1: '/oracle/JW1/origlogA/log_g13m1.dbf'
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: -1
Additional information: 512
Mon Jan  3 07:39:23 2011
Errors in file /oracle/JW1/saptrace/background/jw1_ckpt_368764.trc:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/oracle/JW1/origlogA/cntrl/cntrlJW1.dbf'
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 10
Additional information: 3
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/oracle/JW1/origlogB/cntrl/cntrlJW1.dbf'
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 10
Additional information: 3
Mon Jan  3 07:39:23 2011
Errors in file /oracle/JW1/saptrace/background/jw1_ckpt_368764.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/oracle/JW1/origlogA/cntrl/cntrlJW1.dbf'
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 10
Additional information: 3
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/oracle/JW1/origlogB/cntrl/cntrlJW1.dbf'
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 10
Additional information: 3
Mon Jan  3 07:39:23 2011
CKPT: terminating instance due to error 221
Instance terminated by CKPT, pid = 368764
root@jkwha002d:#

but what’s most interesting is that Oracle will ask AIX to write it to the alert log file – and will be available to read by commands like “tail”, but “cat” command won’t work (you won’t be able to read whole alert log file becasue you don’t have I/O!). What’s even more interesting is that you won’t see those messages after rebooting! (after kernel memory is gone!). If you don’t have I/O how you are going to write/fsync this file???

Another additional thing is that the active HACMP node still will be active, it will be handling Resource Groups, etc. Failover won’t happen. Possible solutions to this problem should be an kernel-based check that would verify that at least /etc is accesible. Why kernel-based? Because you have to have some form of heartbeat in memory (like AIX kernel module or uncachable binary always present in RAM, running in Real-Time scheduling priority) that would continuesly . If it would fail for several times it should reboot the node (that would trigger Resource Group failover to 2nd node in the cluster).

Note: typical HACMP scripting – at least in theory – is not enough, even if it would force running /sbin/halt, how can you be sure that /sbin/halt all required libc contents are in memory??

Perhaps i’ll write a part III of this article :)

-Jakub.

AIX, user gets “pwd: The file access permissions do not allow the specified action.”

Wednesday, January 12th, 2011

Just a short demonstration that file permissions matter on directories used as mount points under AIX6.1. Let’s say we have properly running system, the oracle software owner user “orand3″ can access everything (pwd and ls commands):

root@kopsxsap003d:# mount | grep 102_64
         /dev/nd3orabin   /oracle/ND3/102_64 jfs2   Dec 03 12:00 rw,log=/dev/loglv00
root@kopsxsap003d:# oslevel
6.1.0.0
root@kopsxsap003d:# ls -ald /oracle/ND3/102_64
drwxr-xr-x   74 orand3   dba            4096 Sep 28 06:44 /oracle/ND3/102_64
root@kopsxsap003d:# su - orand3
kopsxsap003d:orand3 1> cd /oracle/ND3/102_64
kopsxsap003d:orand3 2> pwd
/oracle/ND3/102_64
kopsxsap003d:orand3 3> ls | head
MOPatch
OPatch
OPatch_old
admin
assistants
bali
bin
ccr
ccr_stage
cdata
kopsxsap003d:orand3 4> logout
root@kopsxsap003d:#

Now we simulate how the underneath mount point permission affects the users after mounting:

root@kopsxsap003d:# umount /oracle/ND3/102_64
root@kopsxsap003d:# ls -ald /oracle/ND3/102_64
drwxr-xr-x    2 root     system          256 Aug 23 11:16 /oracle/ND3/102_64
root@kopsxsap003d:#

.
As you can see it changed ownership from orand3:dba to root:system after unmounting (because in reality after unmounting /oracle/ND3/102_64 is just a directory on higer level filesystem like /oracle or /oracle/ND3):

root@kopsxsap003d:# mount /oracle/ND3/102_64
root@kopsxsap003d:# ls -ald /oracle/ND3/102_64
drwxr-xr-x   74 orand3   dba            4096 Sep 28 06:44 /oracle/ND3/102_64
root@kopsxsap003d:# umount /oracle/ND3/102_64
root@kopsxsap003d:# ls -ald /oracle/ND3/102_64
drwxr-xr-x    2 root     system          256 Aug 23 11:16 /oracle/ND3/102_64

OK, we want to show how the normal (i.e. non-root users) react to mount point directory that is not allowing access for them:

root@kopsxsap003d:# chmod 700 /oracle/ND3/102_64
root@kopsxsap003d:# ls -ald /oracle/ND3/102_64
drwx------    2 root     system          256 Aug 23 11:16 /oracle/ND3/102_64
root@kopsxsap003d:# mount /oracle/ND3/102_64
root@kopsxsap003d:# ls -ald /oracle/ND3/102_64
drwxr-xr-x   74 orand3   dba            4096 Sep 28 06:44 /oracle/ND3/102_64

See?, after the mounting it changed to 755 orand3:dba , but underneath it is still root:system with 700. So what should do the AIX kernel perform ? Let’s see…

root@kopsxsap003d:# su - orand3
kopsxsap003d:orand3 1> cd /oracle/ND3/102_64
kopsxsap003d:orand3 2> pwd
<strong>pwd: The file access permissions do not allow the specified action.</strong>
kopsxsap003d:orand3 3> ls | head
MOPatch
OPatch
OPatch_old
admin
assistants
bali
bin
ccr
ccr_stage
cdata
kopsxsap003d:orand3 4> logout
root@kopsxsap003d:# umount /oracle/ND3/102_64
root@kopsxsap003d:# ls -ald /oracle/ND3/102_64
drwx------    2 root     system          256 Aug 23 11:16 /oracle/ND3/102_64
root@kopsxsap003d:#

As it was demonstrated underneath mount point permission DO HAVE an affect on non-root users for pwd command if they are not clearly visible. What’s even more interesting that typical libc routines for opening dir, reading dor – they works just fine. It is just a matter of pwd command.

Ok – let’s fix it…

root@kopsxsap003d:# chmod 755 /oracle/ND3/102_64
root@kopsxsap003d:# mount /oracle/ND3/102_64
root@kopsxsap003d:# su - orand3
kopsxsap003d:orand3 1> cd /oracle/ND3/102_64
kopsxsap003d:orand3 2> pwd
/oracle/ND3/102_64
kopsxsap003d:orand3 3>

So what is really pwd command doing?

kopsxsap003d:orand3 1> type pwd
pwd is a shell builtin.
kopsxsap003d:orand3 2> echo $SHELL
/usr/bin/csh
kopsxsap003d:orand3 3> /usr/bin/pwd
/oracle/ND3
kopsxsap003d:orand3 4>

So you have an interal pwd command (for shell) but also real command in /usr/bin. Let’s see what’s happening there:

root@kopsxsap003d:# umount /oracle/ND3/102_64
root@kopsxsap003d:# chmod 700 /oracle/ND3/102_64
root@kopsxsap003d:# mount /oracle/ND3/102_64
root@kopsxsap003d:# su - orand3
kopsxsap003d:orand3 1> cd /oracle/ND3/102_64
kopsxsap003d:orand3 2> pwd
pwd: The file access permissions do not allow the specified action.
kopsxsap003d:orand3 3> /usr/bin/pwd
pwd: The file access permissions do not allow the specified action.
kopsxsap003d:orand3 4> truss /usr/bin/pwd
execve("/usr/bin/pwd", 0x2FF22984, 0x20012ED8)   argc: 1
sbrk(0x00000000)                                = 0x2000104C
vmgetinfo(0x2FF21370, 7, 16)                    = 0
sbrk(0x00000000)                                = 0x2000104C
sbrk(0x00000004)                                = 0x2000104C
(..)
statx("/", 0x2FF21178, 176, 020)                = 0
statx("./", 0x2FF21178, 176, 020)               = 0
statx("./../", 0x2FF21010, 128, 010)            Err#13 EACCES
access("/usr/lib/nls/msg/en_US/libc.cat", 0)    = 0
_getpid()                                       = 8847484
kopen("/usr/lib/nls/msg/en_US/libc.cat", O_RDONLY) = 3
(..)
pwdkwrite(2, " p w d", 3)                               = 3
: kwrite(2, " :  ", 2)                          = 2
The file access permissions do not allow the specified action.kwrite(2, " T h e   f i l e   a c c".., 62)       = 62

(..)
_exit(1)
kopsxsap003d:orand3 5>

So the the truth is hidden in EACCES return code for the statx() syscall.

kopsxsap003d:orand3 11> cd /oracle/ND3/102_64
kopsxsap003d:orand3 12> ls | head
MOPatch
OPatch
OPatch_old
admin
assistants
bali
bin
ccr
ccr_stage
cdata
kopsxsap003d:orand3 13> ls -ald /oracle/ND3/102_64
drwxr-xr-x   74 orand3   dba            4096 Sep 28 06:44 /oracle/ND3/102_64
kopsxsap003d:orand3 14> ls -al /oracle/ND3/102_64/|head
ls: 0653-345 /oracle/ND3/102_64/..: Permission denied.
total 280
drwxr-xr-x   74 orand3   dba            4096 Sep 28 06:44 .
drwxr-x---   35 orand3   dba            4096 Aug 23 12:45 .patch_storage
drwxr-xr-x    2 orand3   dba             256 May 10 2010  MOPatch
drwxr-xr-x    7 orand3   dba            4096 Nov 16 2009  OPatch
drwxr-x---    5 orand3   dba             256 Aug 23 12:27 OPatch_old
drwxr-xr-x    3 orand3   dba             256 Aug 23 13:36 admin
drwxr-x---    7 orand3   dba             256 Aug 23 12:18 assistants
drwxr-x---    3 orand3   dba             256 Aug 23 12:18 bali
drwxr-xr-x    2 orand3   dba           16384 Aug 23 12:45 bin
kopsxsap003d:orand3 15> ls -al .
ls: 0653-341 The fil does not exist.
drwxr-xr-x   74 orand3   dba            4096 Sep 28 06:44 .
kopsxsap003d:orand3 16> ls -al .
ls: 0653-345 ./..: Permission denied.
total 280
drwxr-xr-x   74 orand3   dba            4096 Sep 28 06:44 .
drwxr-x---   35 orand3   dba            4096 Aug 23 12:45 .patch_storage
drwxr-xr-x    2 orand3   dba             256 May 10 2010  MOPatch
drwxr-xr-x    7 orand3   dba            4096 Nov 16 2009  OPatch
drwxr-x---    5 orand3   dba             256 Aug 23 12:27 OPatch_old
drwxr-xr-x    3 orand3   dba             256 Aug 23 13:36 admin
drwxr-x---    7 orand3   dba             256 Aug 23 12:18 assistants
drwxr-x---    3 orand3   dba             256 Aug 23 12:18 bali
[..]
kopsxsap003d:orand3 19> ls -ald .
drwxr-xr-x   74 orand3   dba            4096 Sep 28 06:44 .
kopsxsap003d:orand3 20> ls -ald ./..
ls: 0653-345 ./..: Permission denied.
kopsxsap003d:orand3 21> ls -ald ./../..
ls: 0653-345 ./../..: Permission denied.
kopsxsap003d:orand3 22> ls -ald /oracle/ND3/102_64
drwxr-xr-x   74 orand3   dba            4096 Sep 28 06:44 /oracle/ND3/102_64
kopsxsap003d:orand3 23>

As you saw if you are planning some non-root users (e.g. Oracle in this case) to properly work on mounted filesystems under AIX you need to have proper permissions on the mount point dir first before mounting. Ther e is no escape from this. All comes to the definition of the stat() faimiliy of system calls:

Description

       The stat subroutine obtains information about the file named by the Path parameter. Read, write, or execute
       permission for the named file is not required, but all directories listed in the path leading to the file must be
       <strong>searchable</strong>. The file information, which is a subset of the stat structure, is written to the area specified by the
       Buffer parameter.

“Searchable” for directory in UNIX means “+x” , so someone has to have execute bit set (+x) on the directory in order it to be searchable for system calls.

-J.

PowerHA failure scenario when dealing with SAN-booted LPARs – part I

Sunday, January 9th, 2011

Together with Jedrzej we’ve exposed rather interesting weaknees in IBM PowerHA 5.5 solution (in the old days it was called HACMP). Normally you would assume that in case major cataclysm such as *complete* storage disappear on the active node, PowerHA or AIX has internal mechanism to prevent downtime by switching the services to the next active node (as defined in PowerHA policies/confguration). This is starting to be really interesting when we start talking about SAN BOOT-ed AIX LPARs. As everybody knows any form of assumption is bad (this is starting to be my mantra), and as we show it here avoid this pitfall requires requires SOME PLANNING AND CONFIGURATION to avoid ending in long downtime….

The enviorniment was consisting of:

  • Two LPARs (jkwha001d = active; jkwha002d = passive) running on separate Power frames.
  • Each LPAR on each frame was protected by separate pair of VIOS (to provide redundancy for NPIV-enabled FC fabrics and enable to have Shared Ethernet Failovers).
  • AIX 6.1 TL3
  • Oracle 10gR2
  • PowerHA 5.5SP2
  • Netapp DataOntap 7.3 storage (Netapp cluster, 2 controllers) – yes, AIX LPARs are also SAN BOOT-ed from this storage
  • 1 network-based (IPv4) heartbeat
  • 1 disk-based (on FC LUN) heartbeat
  • Properly configured & running scripts for starting and stopping Oracle
  • No application monitors configured

We’ve performed two experiments:

  1. Simulating total storage losing only on jkwha001d (active one, hosting Oracle) by unmapping Virtual Fabre Channel (NPIV) host adapters (vadapters) on both VIOS protecting jkwha001d.
  2. Simulating total storage losing only on jkwha002d (active one, hosting Oracle — after failover) by forcing disconnection of jkwha002d Fibre Channel initiators (WWNs) on Netapp cluster.

both of which simulated the same thing… Virtual LPAR running without storage at all. More on result in next post…

-Jakub.

I’m IBM Certified Advanced Technical Expert now…

Sunday, January 9th, 2011

This is going to be a rather short blog post. On 17th of Dec last year (2010) I’ve passed all of the pre-requisites (5 other IBM certificates) for being “IBM Certified Advanced Technical Expert – Power Systems with AIX” (the highest [master level] credential for people working with & designing solutions with AIX, VIOS, PowerHA/HACMP, PowerVM, (D)LPARs, Linux on Power, etc)).

ibm_cate

The story is rather boring and there are only two lessons:
1) you won’t pass the required tests/certifications if don’t know this stuff deeply (each technology like PowerVM, PowerHA – at the expert level – doh! you want to be ‘Advanced Technical Expert’!).
2) deep AIX knowledge alone is not enough. You need to get the big picture and be interested in integration of everything that plays with AIX so this includes applications, storage, high-availbility, disaster recoveries, business requirements, a lot of experience of how everything interacts, etc. You need a lot of experience (years of them).

After my whole journey i can say that AIX and its surrounding ecosystem on Power is the most rock solid & mature piece of IT solution that i’ve ever worked with. There is not a lot on the market that can be compared to this solution, and this is the main factor why I’ve decided to take this route. Now, I’ll try to blog in future (if time permits) several interesting cases about Power Systems and AIX…

-Jakub.

Long remote Oracle/SQLNet connection times due to the IPv6 enabled on RHEL 5.x

Sunday, January 9th, 2011

On one of my RAC clusters in lab i’ve noticed that it took sometimes up to 5-15 seconds in order to connect to Oracle database instead of maximum 1s. It was happening on some old VMs back from 2008. The root-cause analysis showed that each RAC node was doing DNS search for it’s own name, which was something i wouldn’t expect normally. Paying attenting to the details showed that Oracle RDBMS was performing DNS on it’s own server name , but not for A DNS entry, but for AAAA (IPv6). Yes libc (DNS resolver) was asking REMOTE DNS SERVERS for it’s own server name – but in IPv6 format (AAAA) – because it couldn’t find the required info via /etc/hosts. This is going to happen even with NETWORING_IPV6=OFF in /etc/sysconfig/network.

The solution was pretty easy after estabilishing the root casue. Just ensure that:

  • all RAC nodes are in /etc/hosts
  • all RAC nodes . are in /etc/hosts
  • that resolv.conf provides at least 2-3 DNS servers that are reachable within max 50ms (check each one with dig(1)). resolv.conf can have options to perform round robin accross those. [my enviorniment was affected by this]
  • you have disabled IPv4 via NETWORKING_IPV6
  • you have aliased IPv6 to “off” in kernel modules to completley disable ipv6 kernel functionality, this in turn also disables any libc IPv6 action [my enviorniment was affected by this, just add "alias net-pf-10 off" to the modprobe.conf]
  • lsmod does NOT displays ipv6 (kernel module is not loaded)

Of course as you can see I wouldn’t be affected by this one if the DNS name server wouldn’t misconfigured in the first place (my lab in 2008 looked differently than it looks now).

Index partition corruption detected via RMAN validate (11.1) and resolution

Sunday, January 9th, 2011

SELF-MEMO: How to fix corruption on indexed partitions (i was bringing my Extended RAC 11.1.0.6 configuration back to life):

RMAN> validate database check logical;

Starting validate at 07-JAN-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=120 instance=erac2 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00001 name=+DATA1/erac/datafile/system.259.654366597
input datafile file number=00004 name=+DATA1/erac/datafile/soeindex.dbf
input datafile file number=00003 name=+DATA1/erac/datafile/soedata.dbf
input datafile file number=00008 name=+DATA1/erac/datafile/undotbs5.270.655999987
input datafile file number=00002 name=+DATA1/erac/datafile/sysaux.260.654366615
input datafile file number=00007 name=+DATA1/erac/datafile/undotbs4.269.655999975
input datafile file number=00005 name=+DATA1/erac/datafile/users.264.654366663
channel ORA_DISK_1: validation complete, elapsed time: 00:05:46
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              20084        459520          13210477
  File Name: +DATA1/erac/datafile/system.259.654366597
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              431075
  Index      0              6321
  Other      0              2040

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    FAILED 0              16058        45584           13210492
  File Name: +DATA1/erac/datafile/sysaux.260.654366615
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              10669
  Index      2              7436
  Other      0              11421

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              39255        126976          12914293
  File Name: +DATA1/erac/datafile/soedata.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              85332
  Index      0              12
  Other      0              2377

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    OK     0              13378        276649          12928336
  File Name: +DATA1/erac/datafile/soeindex.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              258967
  Other      0              4304

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    OK     0              44           640             13209647
  File Name: +DATA1/erac/datafile/users.264.654366663
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              532
  Index      0              1
  Other      0              63

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7    OK     0              10238        25600           13210492
  File Name: +DATA1/erac/datafile/undotbs4.269.655999975
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              15362

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
8    OK     0              120          89600           13210482
  File Name: +DATA1/erac/datafile/undotbs5.270.655999987
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              89480

validate found one or more corrupt blocks
See trace file /u01/app/oracle/diag/rdbms/erac/erac2/trace/erac2_ora_12052.trc for details
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
including current control file for validation
including current SPFILE in backup set
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              1146
Finished validate at 07-JAN-11

RMAN>

Querying for exact info about affected segments:

SQL> SELECT * FROM v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         2       7260          1           13210364 CORRUPT
         2       7308          1           13210260 CORRUPT

SQL>

worth_blogging_corruption_rman_erac2

Fixing in this case is simple, just executed below SQL DML statements to rebuild those partitions:

alter index WRH$_SERVICE_STAT_PK rebuild partition WRH$_SERVIC_427808814_0;
alter index WRH$_SERVICE_STAT_PK rebuild partition WRH$_SERVICE_STAT_MXDB_MXSN;
alter index WRH$_PARAMETER_PK rebuild partition WRH$_PARAME_427808814_0;
alter index WRH$_PARAMETER_PK rebuild partition WRH$_PARAMETER_MXDB_MXSN;

Again I’ve executed RMAN validation to ensure everything is ok:

(..)
channel ORA_DISK_1: validation complete, elapsed time: 00:05:36
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              20084        459520          13258305
  File Name: +DATA1/erac/datafile/system.259.654366597
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              431075
  Index      0              6321
  Other      0              2040

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              15982        45584           13258305
  File Name: +DATA1/erac/datafile/sysaux.260.654366615
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              10669
  Index      0              7505
  Other      0              11428

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              39255        126976          12914293
  File Name: +DATA1/erac/datafile/soedata.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              85332
  Index      0              12
  Other      0              2377

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    OK     0              13378        276649          12928336
  File Name: +DATA1/erac/datafile/soeindex.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              258967
  Other      0              4304

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    OK     0              44           640             13209647
  File Name: +DATA1/erac/datafile/users.264.654366663
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              532
  Index      0              1
  Other      0              63

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7    OK     0              10238        25600           13258305
  File Name: +DATA1/erac/datafile/undotbs4.269.655999975
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              15362

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
8    OK     0              120          89600           13258101
  File Name: +DATA1/erac/datafile/undotbs5.270.655999987
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              89480

channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
including current control file for validation
including current SPFILE in backup set
channel ORA_DISK_1: validation complete, elapsed time: 00:00:00
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              1146
Finished validate at 07-JAN-11

RMAN>

Oracle 11g ADRCI memo for autopurging

Sunday, January 9th, 2011

SELF-MEMO: How to configure ADRCI in 11.1 or 11.2 to purge logs and change policy for removing unneeded log files.

[oracle@rac1 ~]$ du -sm /u01/app/oracle/diag/
766     /u01/app/oracle/diag/
[oracle@rac1 ~]$ adrci

ADRCI: Release 11.1.0.6.0 - Beta on Fri Jan 7 12:28:45 2011

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

ADR base = "/u01/app/oracle"
adrci> set control (SHORTP_POLICY = 360)
DIA-48448: This command does not support multiple ADR homes

adrci> show homes
ADR Homes:
diag/tnslsnr/rac1/listener_rac1
diag/rdbms/erac/erac1
diag/asm/+asm/+ASM1
adrci> set home diag/tnslsnr/rac1/listener_rac1
adrci> set control (SHORTP_POLICY = 360)
adrci> set control (LONGP_POLICY = 360)
adrci> purge
adrci>
adrci> set home diag/rdbms/erac/erac1
adrci> set control (SHORTP_POLICY = 360)
adrci> set control (LONGP_POLICY = 360)
adrci> purge
adrci>
adrci> set home diag/asm/+asm/+ASM1
adrci> set control (SHORTP_POLICY = 360)
adrci> set control (LONGP_POLICY = 360)
adrci> purge
adrci>
adrci> CNTRL+D
[oracle@rac1 ~]$
[oracle@rac1 ~]$
[oracle@rac1 ~]$ du -sm /u01/app/oracle/diag/
185     /u01/app/oracle/diag/
[oracle@rac1 ~]$