Archive for the ‘Oracle ASM’ Category

ASM diskgroup with negative disk sizes

Tuesday, January 7th, 2014

Sometimes as DBA you will see strange signs on your way, like this one ASM +DATA diskgroup with no data, no files something taking space(notice the Total_MB != Free_MB):

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   4095940  3271458                0         3271458              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576    102398   102304                0          102304              0             N  REDO/
MOUNTED  EXTERN  N         512   4096  1048576      1019      623                0             623              0             Y  SYSTEMDG/
ASMCMD> find +DATA *
ASMCMD> ls +DATA
ASMCMD> lsdsk
Path
/dev/oracleasm/disks/DATA_05
/dev/oracleasm/disks/DATA_06
/dev/oracleasm/disks/DATA_07
/dev/oracleasm/disks/DATA_08
/dev/oracleasm/disks/DATA_09
/dev/oracleasm/disks/DATA_10
/dev/oracleasm/disks/DATA_11
/dev/oracleasm/disks/DATA_12
/dev/oracleasm/disks/DATA_13
/dev/oracleasm/disks/DATA_14
/dev/oracleasm/disks/REDO_02
/dev/oracleasm/disks/SYSTEMDG_DISK01
ASMCMD>

Rebalance won’t help, ASM logs are clear and so you are more and more confused. The situtation is a little bit awkward but one can find a negative (!) capacity (close to -2^32 value) of some ASM disks that were recently added:

ASMCMD> lsdsk -k -G DATA
Total_MB      Free_MB   OS_MB  Name       Failgroup  Failgroup_Type  Library  Label  UDID  Product  Redund   Path
  409594       409037  409594  DATA_0000  DATA_0000  REGULAR         System                         UNKNOWN  /dev/oracleasm/disks/DATA_05
  409594       408885  409594  DATA_0001  DATA_0001  REGULAR         System                         UNKNOWN  /dev/oracleasm/disks/DATA_06
  409594       408895  409594  DATA_0002  DATA_0002  REGULAR         System                         UNKNOWN  /dev/oracleasm/disks/DATA_07
  409594       408680  409594  DATA_0003  DATA_0003  REGULAR         System                         UNKNOWN  /dev/oracleasm/disks/DATA_08
  409594       408987  409594  DATA_0004  DATA_0004  REGULAR         System                         UNKNOWN  /dev/oracleasm/disks/DATA_09
  409594       408931  409594  DATA_0005  DATA_0005  REGULAR         System                         UNKNOWN  /dev/oracleasm/disks/DATA_10
  409594       409079  409594  DATA_0006  DATA_0006  REGULAR         System                         UNKNOWN  /dev/oracleasm/disks/DATA_11
  409594       408964  409594  DATA_0007  DATA_0007  REGULAR         System                         UNKNOWN  /dev/oracleasm/disks/DATA_12
  409594  -4294555512  409594  DATA_0008  DATA_0008  REGULAR         System                         UNKNOWN  /dev/oracleasm/disks/DATA_13
  409594  -4294554748  409594  DATA_0009  DATA_0009  REGULAR         System                         UNKNOWN  /dev/oracleasm/disks/DATA_14
ASMCMD>

To fix the problem one can drop the affected ASM disks, remove ASMLib labels, and the critical thing is to reinitialize/overwrite with 0 (using dd from /dev/zero) large header on the individual LUN (at least 10MB) and afterwards re-add those disks (using ASMLib) to the diskgroup using normal ALTER DISKGROUP +DG ADD DISK syntax. Afterwards you will find normal utilization in the diskgroup:

SQL> select name, state, type, total_mb, free_mb from v$asm_diskgroup;

NAME                           STATE       TYPE     TOTAL_MB    FREE_MB
------------------------------ ----------- ------ ---------- ----------
DATA                           MOUNTED     EXTERN    4095940    4090620
REDO                           MOUNTED     EXTERN     102398     102304
SYSTEMDG                       MOUNTED     EXTERN       1019        623

SQL>

Reason: probably LUN disks were not properly initialized (only 1MB has been overwritten), however no errors were found during 1st initialization. Another strange thing on so not critical development system :)

Storage migration for ASM database deployed on Oracle Enterprise Linux in Oracle VM without downtime.

Thursday, May 21st, 2009

Suppose you wanted to migrate your database from storage array SAN1 to SAN2 without downtime. With Oracle databases using ASM this is possible. It was performed on configuration better described here. One note: the LUN can be visible through dom0 or directly by domU (by passing PCI hardware handling into our domU VM) — this posts explains only first scenario, as this is more common scenario  Brief steps include:

  1. Prepare new LUNs on storage array (not described here)
  2. Attach new LUNs to the Oracle VM (not described here, simulated here by using simple zero-padded file created using dd utility; I expect this was performed earlier[scsi bus rescanning and so on] or file created in /OVS).
  3. Modifing VM’s XEN config file.
  4. Online attaching block device to VM.
  5. Preparing new storage device from inside of the target VM.
  6. Discovering new LUN in ASM.
  7. Actual rebalance process…
  8. Verification.

Step 3: Modify vm.cfg file for additional storage.

This is straighforward, just add one line. Do NOT restart the VM. There is no need.

[root@quadovm 01_prac1]# cat vm.cfg
bootloader = '/usr/bin/pygrub'
disk = ['file:/OVS/running_pool/01_prac1/system.img,hda,w',
'file:/OVS/running_pool/01_prac1/oracle_software.img,hdd,w',
'file:/OVS/running_pool/prac.storage.raw,hde,w!',
'file:/OVS/running_pool/prac.ocr,hdf,w!',
'file:/OVS/running_pool/prac.voting,hdg,w!'
<strong>'file:/OVS/running_pool/prac.storage2.raw,hdi,w!',</strong>
]
memory = 1638
name = '01_prac1'
[..]
[root@quadovm 01_prac1]#

Step 4: Attach block device to the running VM.

[root@quadovm 01_prac1]# xm block-attach 01_prac1 file:///OVS/running_pool/prac.storage2.raw /dev/hdi 'w!'
[root@quadovm 01_prac1]#

Step 5: Prepare prac1 VM for new device.

New added storage should be automatically detected, this can be verified by checking dmesg output:

[root@prac1 ~]# dmesg|grep hdi
hdi: unknown partition table
[root@prac1 ~]# ls -al /dev/hde /dev/hdi
brw-rw---- 1 root dba  33, 0 Jan 24 13:00 /dev/hde
brw-r----- 1 root disk 56, 0 Jan 24 12:59 /dev/hdi
[root@prac1 ~]#
[root@prac1 ~]# fdisk -l /dev/hd[ei] 2> /dev/null | grep GB
Disk /dev/hde: 15.7 GB, 15728640000 bytes
Disk /dev/hdi: 16.1 GB, 16106127360 bytes
[root@prac1 ~]#

As we can see new LUN is bigger (it should be bigger or equal, but I haven’t checked what happens if you add a smaller one). Now we have to assign correct permissions so that ASM/Database can use new /dev/hdi device without problems (this doesn’t include modifing udev rules in /etc/udev/, and it is required to make new devices come with right permissions after reboot — do your own home work:) ) :

[root@prac1 ~]# chgrp dba /dev/hdi
[root@prac1 ~]# chmod g+w /dev/hdi
[root@prac1 ~]#

Step 6: Preparing ASM for new disk.

Verification of current diskgroups and changing diskgroup DATA1 ASM_POWER_BALANCE to zero.

SQL> col name format a20
SQL> SELECT name, type, total_mb, free_mb, required_mirror_free_mb req_mirr_free, usable_file_mb FROM V$ASM_DISKGROUP;

NAME                 TYPE     TOTAL_MB    FREE_MB REQ_MIRR_FREE USABLE_FILE_MB
-------------------- ------ ---------- ---------- ------------- --------------
DATA1                EXTERN      15000      14143             0          14143

SQL> ALTER DISKGROUP DATA1 REBALANCE POWER 0 WAIT;

Diskgroup altered.

SQL>

Next we have to force ASM to discover new devices by modifing asm_diskstring parameter (I’m using IFILE for ASM, so I’ve to manually edit pfile for ASM. If I don’t, ASM won’t remember new settings after restarting).

SQL> show parameter string

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/hde

SQL>
SQL> alter system set asm_diskstring='/dev/hde', '/dev/hdi' scope=memory;

System altered.

SQL> show parameter string

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/hde, /dev/hdi

SQL>
[oracle@prac1 11.1.0]$ vi /u01/app/oracle/product/11.1.0/db_1/dbs/init+ASM1.ora
#Modify asm_diskstring here too
asm_diskstring='/dev/hde','/dev/hdi'

Step 7: The main part: ASM rebalance

SQL> ALTER DISKGROUP DATA1 ADD DISK '/dev/hdi';

Diskgroup altered.

SQL> SELECT GROUP_NUMBER, OPERATION, STATE FROM V$ASM_OPERATION;

GROUP_NUMBER OPERA STATE
------------ ----- --------------------
1 REBAL RUN

SQL> select name,path,state,failgroup from v$asm_disk;

NAME                 PATH            STATE                FAILGROUP
-------------------- --------------- -------------------- ----------
DATA1_0000     /dev/hde       NORMAL               DATA1_0000
DATA1_0001     /dev/hdi        NORMAL               DATA1_0001

SQL> ALTER DISKGROUP DATA1 DROP DISK DATA1_0000;

Diskgroup altered.

SQL> SELECT GROUP_NUMBER, OPERATION, STATE, EST_MINUTES FROM V$ASM_OPERATION;

GROUP_NUMBER  OPERA   STATE    EST_MINUTES
---------------------  ---------  ----------- -----------
1             REBAL    RUN        32 

SQL>

Typical snapshot of iostat right now (10 sec averages):

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
hde               0.00     0.00 340.80  1.10    14.20     0.00    85.08     1.43    4.18   0.35  12.04
hdi               0.00     0.00  0.40 357.40     0.01    14.28    81.77    11.52   32.22   2.40  86.04

From other normal SQL session:

SQL> insert into t2(id) values ('2');

1 row created.

SQL> commit;

Commit complete.

SQL>

Back to the ASM instance:

SQL> ALTER DISKGROUP DATA1 REBALANCE POWER 11;

Step 8: Verification.

We’ll just execute some big heavy-intensive SQL statement to generate some IO (thnx to Tanel for blogging this query):

SQL> create table t4 as select rownum r from
(select rownum r from dual connect by rownum <= 1000) a,
(select rownum r from dual connect by rownum <= 1000) b,
(select rownum r from dual connect by rownum <= 1000) c
where rownum <= 100000000;

From iostat we can monitor that only hdi is used, which assures us that database is really using hdi.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
33.33    0.00   26.47    5.88    0.00   34.31

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
hde               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hdi               0.00     0.00 107.84 351.96     1.60    12.35    62.14     4.96   10.30   0.91  41.96