Archive for the ‘Oracle’ Category

Index partition corruption detected via RMAN validate (11.1) and resolution

Sunday, January 9th, 2011

SELF-MEMO: How to fix corruption on indexed partitions (i was bringing my Extended RAC 11.1.0.6 configuration back to life):

RMAN> validate database check logical;

Starting validate at 07-JAN-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=120 instance=erac2 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00001 name=+DATA1/erac/datafile/system.259.654366597
input datafile file number=00004 name=+DATA1/erac/datafile/soeindex.dbf
input datafile file number=00003 name=+DATA1/erac/datafile/soedata.dbf
input datafile file number=00008 name=+DATA1/erac/datafile/undotbs5.270.655999987
input datafile file number=00002 name=+DATA1/erac/datafile/sysaux.260.654366615
input datafile file number=00007 name=+DATA1/erac/datafile/undotbs4.269.655999975
input datafile file number=00005 name=+DATA1/erac/datafile/users.264.654366663
channel ORA_DISK_1: validation complete, elapsed time: 00:05:46
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              20084        459520          13210477
  File Name: +DATA1/erac/datafile/system.259.654366597
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              431075
  Index      0              6321
  Other      0              2040

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    FAILED 0              16058        45584           13210492
  File Name: +DATA1/erac/datafile/sysaux.260.654366615
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              10669
  Index      2              7436
  Other      0              11421

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              39255        126976          12914293
  File Name: +DATA1/erac/datafile/soedata.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              85332
  Index      0              12
  Other      0              2377

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    OK     0              13378        276649          12928336
  File Name: +DATA1/erac/datafile/soeindex.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              258967
  Other      0              4304

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    OK     0              44           640             13209647
  File Name: +DATA1/erac/datafile/users.264.654366663
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              532
  Index      0              1
  Other      0              63

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7    OK     0              10238        25600           13210492
  File Name: +DATA1/erac/datafile/undotbs4.269.655999975
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              15362

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
8    OK     0              120          89600           13210482
  File Name: +DATA1/erac/datafile/undotbs5.270.655999987
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              89480

validate found one or more corrupt blocks
See trace file /u01/app/oracle/diag/rdbms/erac/erac2/trace/erac2_ora_12052.trc for details
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
including current control file for validation
including current SPFILE in backup set
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              1146
Finished validate at 07-JAN-11

RMAN>

Querying for exact info about affected segments:

SQL> SELECT * FROM v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         2       7260          1           13210364 CORRUPT
         2       7308          1           13210260 CORRUPT

SQL>

worth_blogging_corruption_rman_erac2

Fixing in this case is simple, just executed below SQL DML statements to rebuild those partitions:

alter index WRH$_SERVICE_STAT_PK rebuild partition WRH$_SERVIC_427808814_0;
alter index WRH$_SERVICE_STAT_PK rebuild partition WRH$_SERVICE_STAT_MXDB_MXSN;
alter index WRH$_PARAMETER_PK rebuild partition WRH$_PARAME_427808814_0;
alter index WRH$_PARAMETER_PK rebuild partition WRH$_PARAMETER_MXDB_MXSN;

Again I’ve executed RMAN validation to ensure everything is ok:

(..)
channel ORA_DISK_1: validation complete, elapsed time: 00:05:36
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              20084        459520          13258305
  File Name: +DATA1/erac/datafile/system.259.654366597
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              431075
  Index      0              6321
  Other      0              2040

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              15982        45584           13258305
  File Name: +DATA1/erac/datafile/sysaux.260.654366615
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              10669
  Index      0              7505
  Other      0              11428

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              39255        126976          12914293
  File Name: +DATA1/erac/datafile/soedata.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              85332
  Index      0              12
  Other      0              2377

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    OK     0              13378        276649          12928336
  File Name: +DATA1/erac/datafile/soeindex.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              258967
  Other      0              4304

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    OK     0              44           640             13209647
  File Name: +DATA1/erac/datafile/users.264.654366663
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              532
  Index      0              1
  Other      0              63

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7    OK     0              10238        25600           13258305
  File Name: +DATA1/erac/datafile/undotbs4.269.655999975
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              15362

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
8    OK     0              120          89600           13258101
  File Name: +DATA1/erac/datafile/undotbs5.270.655999987
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              89480

channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
including current control file for validation
including current SPFILE in backup set
channel ORA_DISK_1: validation complete, elapsed time: 00:00:00
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              1146
Finished validate at 07-JAN-11

RMAN>

Oracle 11g ADRCI memo for autopurging

Sunday, January 9th, 2011

SELF-MEMO: How to configure ADRCI in 11.1 or 11.2 to purge logs and change policy for removing unneeded log files.

[oracle@rac1 ~]$ du -sm /u01/app/oracle/diag/
766     /u01/app/oracle/diag/
[oracle@rac1 ~]$ adrci

ADRCI: Release 11.1.0.6.0 - Beta on Fri Jan 7 12:28:45 2011

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

ADR base = "/u01/app/oracle"
adrci> set control (SHORTP_POLICY = 360)
DIA-48448: This command does not support multiple ADR homes

adrci> show homes
ADR Homes:
diag/tnslsnr/rac1/listener_rac1
diag/rdbms/erac/erac1
diag/asm/+asm/+ASM1
adrci> set home diag/tnslsnr/rac1/listener_rac1
adrci> set control (SHORTP_POLICY = 360)
adrci> set control (LONGP_POLICY = 360)
adrci> purge
adrci>
adrci> set home diag/rdbms/erac/erac1
adrci> set control (SHORTP_POLICY = 360)
adrci> set control (LONGP_POLICY = 360)
adrci> purge
adrci>
adrci> set home diag/asm/+asm/+ASM1
adrci> set control (SHORTP_POLICY = 360)
adrci> set control (LONGP_POLICY = 360)
adrci> purge
adrci>
adrci> CNTRL+D
[oracle@rac1 ~]$
[oracle@rac1 ~]$
[oracle@rac1 ~]$ du -sm /u01/app/oracle/diag/
185     /u01/app/oracle/diag/
[oracle@rac1 ~]$

MAA challenge lab, part #4

Friday, April 30th, 2010

No posts on my blog for long time, need to change that. So i was trying to get MAA (Maximum Availability Architecture by Oracle) lab again in shape for writing Master of Science thesis…

Somewhere near Janurary/Februrary this year:

  • Primary VM RAC nodes prac1, prac2, prac3 are working again, but database db3 is not (unable to archive to db3dg on srac1,srac2). Main root cause was that experiments in April of 2009 with log_archive_min_succeed_dest=2 setting caused losing sync betweeen primary and standby
  • Standby VM RAC nodes srac1, srac2 VMs are up again, but without db3dg database and ASM instances

So some actions were performed (Feburary/March):

  • ASM +DATA1 recreated on both new srac1 & srac2 VMs (on dedicated VG/LV: vg01/db3dg on physical synapse server, on dedicated 15k RPM SCSI drive[4th in system] named sdb [sda is RAID5 for OS/XEN/VMs]).
  • Fixed some XEN configuration files for VMs with “w!” (disabled shared locks mode) & losetup bug in /etc/xen/scripts/block.
  • Standby rebuilded (from RMAN duplicate backup + FAL gap fetching), so finally i’ve working MAA again :)

Problematic thing is that after failover/switchover due to differences in primary (64-bit) and standby (32-bit) i have to invalidate & recompile all PL/SQL packages (very time consuming on old hardware! and seems that Broker is unable to handle that case):

So last week:

  • prac7, prac8 32-bit VMs were launched.
  • ‘ve switched to using FreeNX server from typical X11 SSH forwarding (much more interactive on slow OpenVPN!).
  • Also i’ve upgraded Grid Control (OMS) to 10.2.0.5 (next step will be to 11g), OMS database repository to 11.1.0.6 (fro 10.1.0.4).
  • Next i deployed 32-bit clusterware (11.1) on those notes, played a little bit with OCR corruptions [metalink note ID 399482.1] after hitting mysterious listener outages (OCR corruptions were not the case for it, it was permission issue on single directory – doh!)
  • Created clustered ASM, and created 32-bit RAC database named “db4.lab1″.
  • I’ve also deployed OEM/GC agents on prac7, prac8 (10.2.0.5 versions) directly from GC (awesome for mass deployments!).
  • Yesterday I’ve exported using expdp 1 schema from db3.lab1 database (64-bit RAC) and imported it into “db4.lab1″…

Planned soon:

  • Upgrading GC to 11g and upgrading agents to 11g too.
  • Building MAA for “db4.lab1″, plan is to create DataGuard for it on srac1, srac2 VMs (32-biit too, they already host DataGuard for “db3.lab1″). But this one is going to use DataGuard Broker to get FailStart Failovers working (2 node primary RAC with 2 node standby RAC)
  • Extending primary RAC to prac9 (to be created), so to have 3 node primary RAC for “db4.lab1″ protected by DataGuard broker with 2 node standby RAC
  • Fun with Snapshot standby and Real Application Testing
  • Detailed measurements of failover times with SYNC DataGuard and Broker – orajdbcstat (my little utility for testing JDBC/FCF failover times, perhaps i’ll uplift it for UCP from 11.2g)

Storage migration for ASM database deployed on Oracle Enterprise Linux in Oracle VM without downtime.

Thursday, May 21st, 2009

Suppose you wanted to migrate your database from storage array SAN1 to SAN2 without downtime. With Oracle databases using ASM this is possible. It was performed on configuration better described here. One note: the LUN can be visible through dom0 or directly by domU (by passing PCI hardware handling into our domU VM) — this posts explains only first scenario, as this is more common scenario  Brief steps include:

  1. Prepare new LUNs on storage array (not described here)
  2. Attach new LUNs to the Oracle VM (not described here, simulated here by using simple zero-padded file created using dd utility; I expect this was performed earlier[scsi bus rescanning and so on] or file created in /OVS).
  3. Modifing VM’s XEN config file.
  4. Online attaching block device to VM.
  5. Preparing new storage device from inside of the target VM.
  6. Discovering new LUN in ASM.
  7. Actual rebalance process…
  8. Verification.

Step 3: Modify vm.cfg file for additional storage.

This is straighforward, just add one line. Do NOT restart the VM. There is no need.

[root@quadovm 01_prac1]# cat vm.cfg
bootloader = '/usr/bin/pygrub'
disk = ['file:/OVS/running_pool/01_prac1/system.img,hda,w',
'file:/OVS/running_pool/01_prac1/oracle_software.img,hdd,w',
'file:/OVS/running_pool/prac.storage.raw,hde,w!',
'file:/OVS/running_pool/prac.ocr,hdf,w!',
'file:/OVS/running_pool/prac.voting,hdg,w!'
<strong>'file:/OVS/running_pool/prac.storage2.raw,hdi,w!',</strong>
]
memory = 1638
name = '01_prac1'
[..]
[root@quadovm 01_prac1]#

Step 4: Attach block device to the running VM.

[root@quadovm 01_prac1]# xm block-attach 01_prac1 file:///OVS/running_pool/prac.storage2.raw /dev/hdi 'w!'
[root@quadovm 01_prac1]#

Step 5: Prepare prac1 VM for new device.

New added storage should be automatically detected, this can be verified by checking dmesg output:

[root@prac1 ~]# dmesg|grep hdi
hdi: unknown partition table
[root@prac1 ~]# ls -al /dev/hde /dev/hdi
brw-rw---- 1 root dba  33, 0 Jan 24 13:00 /dev/hde
brw-r----- 1 root disk 56, 0 Jan 24 12:59 /dev/hdi
[root@prac1 ~]#
[root@prac1 ~]# fdisk -l /dev/hd[ei] 2> /dev/null | grep GB
Disk /dev/hde: 15.7 GB, 15728640000 bytes
Disk /dev/hdi: 16.1 GB, 16106127360 bytes
[root@prac1 ~]#

As we can see new LUN is bigger (it should be bigger or equal, but I haven’t checked what happens if you add a smaller one). Now we have to assign correct permissions so that ASM/Database can use new /dev/hdi device without problems (this doesn’t include modifing udev rules in /etc/udev/, and it is required to make new devices come with right permissions after reboot — do your own home work:) ) :

[root@prac1 ~]# chgrp dba /dev/hdi
[root@prac1 ~]# chmod g+w /dev/hdi
[root@prac1 ~]#

Step 6: Preparing ASM for new disk.

Verification of current diskgroups and changing diskgroup DATA1 ASM_POWER_BALANCE to zero.

SQL> col name format a20
SQL> SELECT name, type, total_mb, free_mb, required_mirror_free_mb req_mirr_free, usable_file_mb FROM V$ASM_DISKGROUP;

NAME                 TYPE     TOTAL_MB    FREE_MB REQ_MIRR_FREE USABLE_FILE_MB
-------------------- ------ ---------- ---------- ------------- --------------
DATA1                EXTERN      15000      14143             0          14143

SQL> ALTER DISKGROUP DATA1 REBALANCE POWER 0 WAIT;

Diskgroup altered.

SQL>

Next we have to force ASM to discover new devices by modifing asm_diskstring parameter (I’m using IFILE for ASM, so I’ve to manually edit pfile for ASM. If I don’t, ASM won’t remember new settings after restarting).

SQL> show parameter string

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/hde

SQL>
SQL> alter system set asm_diskstring='/dev/hde', '/dev/hdi' scope=memory;

System altered.

SQL> show parameter string

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/hde, /dev/hdi

SQL>
[oracle@prac1 11.1.0]$ vi /u01/app/oracle/product/11.1.0/db_1/dbs/init+ASM1.ora
#Modify asm_diskstring here too
asm_diskstring='/dev/hde','/dev/hdi'

Step 7: The main part: ASM rebalance

SQL> ALTER DISKGROUP DATA1 ADD DISK '/dev/hdi';

Diskgroup altered.

SQL> SELECT GROUP_NUMBER, OPERATION, STATE FROM V$ASM_OPERATION;

GROUP_NUMBER OPERA STATE
------------ ----- --------------------
1 REBAL RUN

SQL> select name,path,state,failgroup from v$asm_disk;

NAME                 PATH            STATE                FAILGROUP
-------------------- --------------- -------------------- ----------
DATA1_0000     /dev/hde       NORMAL               DATA1_0000
DATA1_0001     /dev/hdi        NORMAL               DATA1_0001

SQL> ALTER DISKGROUP DATA1 DROP DISK DATA1_0000;

Diskgroup altered.

SQL> SELECT GROUP_NUMBER, OPERATION, STATE, EST_MINUTES FROM V$ASM_OPERATION;

GROUP_NUMBER  OPERA   STATE    EST_MINUTES
---------------------  ---------  ----------- -----------
1             REBAL    RUN        32 

SQL>

Typical snapshot of iostat right now (10 sec averages):

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
hde               0.00     0.00 340.80  1.10    14.20     0.00    85.08     1.43    4.18   0.35  12.04
hdi               0.00     0.00  0.40 357.40     0.01    14.28    81.77    11.52   32.22   2.40  86.04

From other normal SQL session:

SQL> insert into t2(id) values ('2');

1 row created.

SQL> commit;

Commit complete.

SQL>

Back to the ASM instance:

SQL> ALTER DISKGROUP DATA1 REBALANCE POWER 11;

Step 8: Verification.

We’ll just execute some big heavy-intensive SQL statement to generate some IO (thnx to Tanel for blogging this query):

SQL> create table t4 as select rownum r from
(select rownum r from dual connect by rownum <= 1000) a,
(select rownum r from dual connect by rownum <= 1000) b,
(select rownum r from dual connect by rownum <= 1000) c
where rownum <= 100000000;

From iostat we can monitor that only hdi is used, which assures us that database is really using hdi.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
33.33    0.00   26.47    5.88    0.00   34.31

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
hde               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hdi               0.00     0.00 107.84 351.96     1.60    12.35    62.14     4.96   10.30   0.91  41.96

Graphing Oracle’s Shared Server configuration (V$QUEUE)

Thursday, May 21st, 2009

Tuning Oracle’s Shared Server(MTS) is not so straightforward task, but I’ve developed simple perl script named plotvqueue.pl that can draw over time the following info:
1) absolute COMMON waits – how many waits there were in the COMMON (VC) queue in interval time (1min)
2) no# of COMMON queued – number of items in the COMMON queue (serviced by shared servers)
3) absolute DISPATCHERS waits – how many waits there were for all the DISPATCHERs queues in interval time (1min)
4) no# of DISPATCHERS queued – number of items in the DISPATCHERs queues.

This can be used as a hint for raising dispatchers and/or shared_servers settings.

for detailed discussion be sure to read Metalink Doc.ID 1005259.6 (“Shared Server (MTS) Diagnosis”), especially get familiar with the following fragment about architecture:

sharedserverarchitecture

“The stages that Shared Server goes through are quite simple. After the client sends the connection request to the Listener, it will either redirect or hand off (called warm hand-off) the connection to the Dispatcher (the Dispatcher does not necessarily need to be on the same host as the Listener). Once the client has connected to a Dispatcher it stays connected to that Dispatcher. Before the client completes the database log in, the Dispatcher associates a Virtual Circuit (VC) for that database session. There exists exactly one row in the VC view (V$CIRCUIT) for each client connection. This view also shows the current status of the client’s VC. Once the VC has been associated with the database session, the client will complete the database logon by passing the username and password to the Dispatcher. This request, as part of the VC for that new session, will be placed in the Common Queue where the first available Shared Server will complete the logon. Once each phase of the logon has completed, the Shared Server will pass the VC back to the Dispatcher, which then passes the response back to the Client (this actually takes several round trips to the client, in just the same manner as if it was a Dedicated connection).

Once the logon has completed, the client starts a normal conversation with the database. When the client makes a request to the database, it is the Dispatcher that picks up this request from the operating system’s network protocol. The Dispatcher then determines which client session the request came from (remember that a Dispatcher can be configured for Connection Pooling and Multiplexing: see the Net Administration Guide for more information on those configurations), tags that sessions’ VC that there is a new message (there is also a pointer to that session buffer in the VC) and places the VC in the Common Queue. The CQ is serviced by the Shared Servers on a first-in-first-out basis. The next available Shared Server pulls the VC from the CQ and processes the request. Part of the VC structure is the identity of the Dispatcher that created it (and which client is connected to it). When the Shared Server is finished processing the request, it writes the output to the session buffer, changes the VC’s ownership back to the Dispatcher that created it, places the VC into that Dispatcher’s queue, and posts the Dispatcher it has something in its queue. The Dispatcher then checks its queue and sends what is in the session buffer through the operating system network protocol back to the Client.”

The process of drawing can be divided in 4 stages:
1) installation of this script and it’s requirements (DBD, DBI, instantclient, etc.)
2) creation of sampling tables (plotvqueue_*) — see script comments
3) real collecting the data, running PL/SQL statement, e.g. from screen(1) to sample 24h run of the database with 1 min interval
4) running this script to analyse data
Optional steps (to catch-up performance differences after changes):
5) online changing database parameters (max_shared_servers, etc.)
6) same as step #3
7) same as step #4

The following screenshots comes from real production system before and after tuning (raising max_shared_servers, shared_servers and dispatchers).
commonqueue_may12_13

dispqueue_may12_13

SHA1, SHA256, SHA512 in Oracle for free without using DBMS_CRYPTO

Thursday, May 21st, 2009

SHA1, SHA256, SHA512 in Oracle for free without using DBMS_CRYPTO! (yay! without Enterprise Edition!*) powered by GNU CRYPTO project

For detailed list of algorithms please consider this link. (much more than DBMS_CRYPTO in 11g, which requires you to buy Enterprise Edition).

[oracle@xeno src]$ ls -l
total 764
-rw-rw-r-- 1 vnull vnull    458 Mar  1 05:53 CommonHash.java
-rw-r--r-- 1 vnull vnull 598036 Mar  1 04:47 gnu-crypto.jar
-rw-r--r-- 1 vnull vnull  96430 Mar  1 04:47 javax-crypto.jar
-rw-r--r-- 1 vnull vnull  16969 Mar  1 04:47 javax-security.jar
-rw-rw-r-- 1 vnull vnull    214 Mar  1 05:27 Main.java
-rw-rw-r-- 1 vnull vnull    145 Mar  1 05:27 SHA1.java
-rw-rw-r-- 1 vnull vnull    152 Mar  1 05:18 SHA256.java
-rw-rw-r-- 1 vnull vnull    152 Mar  1 05:18 SHA512.java
[oracle@xeno src]$
[oracle@xeno src]$ loadjava -u vnull/*** -v -resolve *.java *.jar
arguments: '-u' 'vnull/***' '-v' '-resolve' 'CommonHash.java' 'Main.java' 'SHA1.java' 'SHA256.java' 'SHA512.java' 'gnu-crypto.jar' 'javax-crypto.jar' 'javax-security.jar'
[..]
Classes Loaded: 560
Resources Loaded: 3
Sources Loaded: 0
Published Interfaces: 0
Classes generated: 0
Classes skipped: 1
Synonyms Created: 0
Errors: 0
[oracle@xeno src]$

Now as SYSDBA:

SQL> GRANT CONNECT,RESOURCE,EXECUTE ON DBMS_CRYPTO, CREATE PROCEDURE TO vnull;
SQL> conn vnull/***
Connected.
SQL> CREATE OR REPLACE FUNCTION gnuhash_sha256 (string IN VARCHAR2) RETURN VARCHAR2 AS  LANGUAGE JAVA NAME 'SHA256.calcHash(java.lang.String) return java.lang.String';
  2  /

Function created.

SQL> CREATE OR REPLACE FUNCTION gnuhash_sha512 (string IN VARCHAR2) RETURN VARCHAR2 AS LANGUAGE JAVA NAME 'SHA512.calcHash(java.lang.String) return java.lang.String';
  2  /

Function created.

SQL> CREATE OR REPLACE FUNCTION gnuhash_sha1 (string IN VARCHAR2) RETURN VARCHAR2 AS LANGUAGE JAVA NAME 'SHA1.calcHash(java.lang.String) return java.lang.String';
  2  /

Function created.

SQL> select gnuhash_sha1('1234') from dual;

GNUHASH_SHA1('1234')
--------------------------------------------------------------------------------
7110eda4d09e062aa5e4a390b0a572ac0d2c0220

SQL> select gnuhash_sha256('1234') from dual;

GNUHASH_SHA256('1234')
--------------------------------------------------------------------------------
03ac674216f3e15c761ee1a5e255f067953623c8b388b4459e13f978d7c846f4

SQL> select gnuhash_sha512('1234') from dual;

GNUHASH_SHA512('1234')
--------------------------------------------------------------------------------
d404559f602eab6fd602ac7680dacbfaadd13630335e951f097af3900e9de176b6db28512f2e000b9d04fba5133e8b1c6e8df59db3a8ab9d60be4b97cc9e81db

SQL>

Verify results using OpenSSL :

[vnull@xeno ~]$ echo -n "1234" | openssl dgst -sha1
7110eda4d09e062aa5e4a390b0a572ac0d2c0220
[vnull@xeno ~]$ echo -n "1234" | openssl dgst -sha256
03ac674216f3e15c761ee1a5e255f067953623c8b388b4459e13f978d7c846f4
[vnull@xeno ~]$ echo -n "1234" | openssl dgst -sha512
d404559f602eab6fd602ac7680dacbfaadd13630335e951f097af3900e9de176b6db28512f2e000b9d04fba5133e8b1c6e8df59db3a8ab9d60be4b97cc9e81db
[vnull@xeno ~]$

A little bonus, performance verification: DBMS_CRYPTO from 11.1.0.6 versus GNU.CRYPTO.HASH Java library running in JVM in Oracle (oracle_sha1 vs gnuhash_sha1, Oracle does not support SHA-2 standard yet, only SHA1=160 bits).

SQL> CREATE OR REPLACE FUNCTION oracle_sha1 (string IN VARCHAR2) RETURN VARCHAR2 AS BEGIN
  2  RETURN sys.dbms_crypto.hash(UTL_I18N.STRING_TO_RAW ('1234','AL32UTF8'),
  3  sys.dbms_crypto.hash_sh1);
  4  END;
  5  /

Function created.

SQL> select oracle_sha1('1234') from dual;

ORACLE_SHA1('1234')
--------------------------------------------------------------------------------
7110EDA4D09E062AA5E4A390B0A572AC0D2C0220

SQL>

From this quick & dirty test you can see there is only 4% performance difference between native DBMS_CRYPTO and GNU_HASH…

benchmark

* = UPDATE(12/11/2013): please verify with your Oracle Sales representative. Depending on who you ask and how you ask you may get different answer. DBMS_CRYPTO doesn’t seem to be licensed as ASO anymore, same for SSL encryption for RAC databases even for SE, so in those scenarios if you dont want to have encrypted data on the write you may have less complicated alternatives, YMMV.

MAA challenge lab, part #3

Wednesday, April 15th, 2009

11/04/2009:
1) Finally got some time for cleaning up Grid Control (dropping ora2 and ora3). Secured all agents (on VMs: gc, prac1, prac2). I’ve also cleaned up XEN dom0 (from quadvm). These VMs are not needed anymore. db3.lab (RAC on prac1, prac2) is in GC. Installed 10.2.0.5 32-bit agent on srac1 (single node standby).
2) Testing application of single-node RAC standby for differences in Standby Redo Logs processing (verifcation performed by using read-only mode).
3) LNS (ASYNC=buffers_number in LOG_ARCHIVE_DEST_2 parameter) performance fun.
Prepared srac2 for future RAC extension (to two nodes: srac1, srac2). Also installed GC agent on srac2 (10.2.0.5).
4) prac3: cloning and adding it into the Clusterware prac_cluster (addNode.sh from prac2). Deploying GC 10.2.0.5 agent on this node (prac1 and prac2 are both 10.2.0.4, in future I’ll try to upgrade it via GC). Later manually creating +ASM3 and db33 instances (redo, undo, srvctl, etc.). It means that I have 3 node primary RAC :)
5) srac2: Plan is to add it to the srac_cluster and make it 2 node standby RAC. +ASM2 was running, but more work is needed (mainly registrations in CRS/OCR).
6) Flash Recovery Area on standby ASM’s diskgroup +DATA1 was exhausted (thus MRP0 died) so I performed full RMAN backup with archivelogs to QUADVM dom0′s NFS and afterwards I’ve deleted archivelogs to reclaim some space. On SRAC standby I’ve changed archivelog deletion policy (in RMAN) and then restarted MRP0.
Unfortunatley I’ve lost my RAID5 array on synapse (dom0 hosting srac_cluster: srac1, srac2; it’s and old LH 6000R HP server) — 2 drives have failed, so my standby RAC is doomed until I’ll rebuild synapse on new SCSI drives (to be ordered) :(
UPDATE: I’ve verified backups of my srac1 and srac2 VMs but the backups for ASM diskgroup +DATA1 failed.  Also my OCR and voting disks are lost. It will be real fun & challenge to recover this standby RAC enviorniment (this will be also pretty like restoring non DataGuarded RAC enviorniment after site crash). I belive I won’t have to rebuild my standby from primary, because I’ve backuped this standby earlier. OCR hopefully can be restored from Clusterware auto-backup location.

15/03/2009:
Finally two node RAC {prac1,prac2} is being protected by DataGuard single-node standby RAC {srac1}.

21/02/2009:
XEN shared storage and oracle software storage on synapse for srac1 and configuring it as /u01 on srac1
Clusterware, ASM +DATA1, database (RAC) installation on srac1 (x86_32).

Oracle RAC main components… (quick self memo)

Wednesday, April 15th, 2009

I always forget how to resolve those shortcuts and which one does what, especially CSS versus CRS :)

CSS – Cluster Synchronization Services – Component responsible for controlling which nodes are members of the clusters. When node joins or leaves the cluster, CSS notifies the other nodes of the change in configuration. If this process fails, then the cluster will be restarted. Under Linux, CSS is implemented by the ocssd daemon, which runs as the root user.

CRS – Cluster Ready Services – Manages resources (databases, instances, services, listeners, VIPs, application processes, GSD, ONS) Configuration is stored in OCR. When status of resource changes CRS emits an event. CRS will try to restart resource up to 5 times before giving up. Under Linux, CRS is implemented as the crsd process, which runs with root privileges. In the event of failure this process restarts automaticlly.

EVM - Event Manager – Implemented as the evmd daemon (also runs as root). Oracle Clusterware also communicates with the ONS, which is a publish & subsribe service that communicates FAN events to clients.

DBMS_XPLAN.DISPLAY_CURSOR() is only for queries…

Monday, March 9th, 2009

It seems not so well know fact that the DBMS_XPLAN.DISPLAY_CURSOR() function won’t show bind variables for DML-type queries. It is documented that V$SQL_BIND_CAPTURE will capture bind variables only for SQL *queries* when bind variables are in HAVING or WHERE clauses (!). Demonstration:

SQL> drop table t;

Table dropped.

SQL> create table t (id number(10));

Table created.

SQL> insert into t values ('1');

1 row created.

SQL> commit;

Commit complete.

SQL> variable x number;
SQL> exec : x := 5;

PL/SQL procedure successfully completed.

SQL> select /* TESTMARK1 */ count(1) from t where id=:x;

  COUNT(1)
----------
         0

SQL>
SQL> select sql_id,sql_fulltext from v$sqlarea where sql_fulltext like 'select /* TESTMARK1%';

SQL_ID        SQL_FULLTEXT
------------- ------------------------------------------------------------
arjxpk7ghccxn select /* TESTMARK1 */ count(1) from t where id=:x

SQL> select * from table(dbms_xplan.display_cursor('&amp;amp;amp;amp;amp;amp;sql_id', 0, 'basic +PEEKED_BINDS'));
Enter value for sql_id: arjxpk7ghccxn
old   1: select * from table(dbms_xplan.display_cursor('&amp;amp;amp;amp;amp;amp;sql_id', 0, 'basic +PEEKED_BINDS'))
new   1: select * from table(dbms_xplan.display_cursor('arjxpk7ghccxn', 0, 'basic +PEEKED_BINDS'))

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select /* TESTMARK1 */ count(1) from t where id=:x

Plan hash value: 2966233522

-----------------------------------
| Id  | Operation          | Name |
-----------------------------------
|   0 | SELECT STATEMENT   |      |
|   1 |  SORT AGGREGATE    |      |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   2 |   TABLE ACCESS FULL| T    |
-----------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - :X (NUMBER): 5

19 rows selected.

SQL>
SQL>
SQL> select sql_id,sql_fulltext from v$sqlarea where sql_fulltext like 'insert /* TESTMARK2%';

SQL_ID        SQL_FULLTEXT
------------- ------------------------------------------------------------
13duk396qjv59 insert /* TESTMARK2 */ into t values (:z)

SQL> select * from table(dbms_xplan.display_cursor('&amp;amp;amp;amp;amp;amp;sql_id', 0, 'ADVANCED'));
Enter value for sql_id: 13duk396qjv59
old   1: select * from table(dbms_xplan.display_cursor('&amp;amp;amp;amp;amp;amp;sql_id', 0, 'ADVANCED'))
new   1: select * from table(dbms_xplan.display_cursor('13duk396qjv59', 0, 'ADVANCED'))

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  13duk396qjv59, child number 0
-------------------------------------
insert /* TESTMARK2 */ into t values (:z)

-------------------------------------------------
| Id  | Operation                | Name | Cost  |
-------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |
|   1 |  LOAD TABLE CONVENTIONAL |      |       |
-------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - INS$1

Note
-----
   - cpu costing is off (consider enabling it)

21 rows selected.

SQL>
SQL> select NAME, DATATYPE_STRING, WAS_CAPTURED from v$sql_bind_capture where sql_id='13duk396qjv59';

NAME       DATATYPE_STRING                                              WAS
---------- ------------------------------------------------------------ ---
:Z         NUMBER                                                       NO

SQL>

This is consistent when using new DBMS_SQLTUNE_SQLTUNE.EXTRACT_BINDS() function.

SQL> variable x number;
SQL> exec : x := 9999999;

PL/SQL procedure successfully completed.

SQL> select * from dual where 1=:x;

no rows selected

SQL> select dbms_sqltune.extract_binds(bind_data) bind from v$sql where sql_text like 'select * from dual where 1=:x';

BIND(NAME, POSITION, DUP_POSITION, DATATYPE, DATATYPE_STRING, CHARACTER_SID, PRE
--------------------------------------------------------------------------------
SQL_BIND_SET(SQL_BIND(NULL, 1, NULL, 2, 'NUMBER', NULL, NULL, NULL, 22, '28-FEB-
09', 'NULL', ANYDATA()))

SQL>
SQL> select dbms_sqltune.extract_binds(bind_data) bind from v$sql where sql_id='36kuvjkmjw13u';

BIND(NAME, POSITION, DUP_POSITION, DATATYPE, DATATYPE_STRING, CHARACTER_SID, PRE
--------------------------------------------------------------------------------
SQL_BIND_SET()

SQL>

Workaround is to setup 10046(level 12) tracing in desired session(s) for capturing binds also for DMLs.

New very professional blog from ex-Pythian DBA team leader. Worth reading!

Tuesday, February 3rd, 2009

Earlier Alex Fatkulin blogged (& worked) at Pythian, but now his new posts are appearing on blogspot. If you are about internals of Oracle it’s a must read!

MAA challenge lab, part #2

Tuesday, February 3rd, 2009

25/01/2009: Cloning prac2 and adding it to Clusterware cluster (with prac1). Debugging blogged earlier Clusterware bug.

26/01/2009: Reading about migration of single instance ASM to full clustered ASM/RAC. Experiments with NOLOGGING and RMAN recovery on xeno workstation (db TEST).

27/01/2009: I’ve managed to migrate to full working RAC for db3.lab1 {nodes prac1 and prac2} with ASM storage (ASM migration done using DBCA; RAC migration performed by using rconfig). Deployed GC agent on prac2.

29/01/2009: Experiments with DBMS_REPAIR, NOLOGGING and RMAN recovery on xeno workstation (db TEST).

Oracle Clusterware 11g addNode.sh ONS bug (with GDB fun & solution)

Monday, January 26th, 2009

Yesterday I’ve added new node for one of Clusterware installations. I’ve decided to use addNode.sh script. Unfortunatley after adding new node (prac2) to cluster, kernel reported every few seconds segmentation fault of ONS daemon. Here’s how one can use some in depth knowledge to solve such problems (OS: Oracle Enterprise Linux 5.1, Clusterware: 11.1.0.6, platform: x86_64).

Some messages after Clusterware shutdown — most important are first two lines here.

[root@prac2 conf]# tail /var/log/messages
Jan 25 20:42:14 prac2 kernel: ons[1923]: segfault at 0000000000000060 rip 000000000040cb56 rsp 0000000043204028 error 4
Jan 25 20:42:14 prac2 kernel: ons[1927]: segfault at 0000000000000060 rip 000000000040cb56 rsp 0000000043204028 error 4
Jan 25 20:42:14 prac2 logger: Oracle clsomon shutdown successful.
Jan 25 20:42:14 prac2 logger: Oracle CSS family monitor shutting down gracefully.
Jan 25 20:42:15 prac2 logger: Oracle CSSD graceful shutdown
Jan 25 20:42:15 prac2 logger: Oprocd received graceful shutdown request. Shutting down.
[root@prac2 conf]#

Next, we are going to unlimit core-dump files (by default they are disabled). Later we create special directory for central dumping cores, and also configure kernel to use it.
[root@prac2 ~]# ulimit -c unlimited
[root@prac2 ~]# ulimit -c
unlimited
[root@prac2 ~]# mkdir /tmp/cores
[root@prac2 ~]# chmod 1777 /tmp/cores
[root@prac2 ~]# sysctl kernel.core_pattern='/tmp/cores/core.%e.%p'
kernel.core_pattern = /tmp/cores/core.%e.%p
[root@prac2 ~]#

After bringing up Clusterware (crsctl start crs) we are going to get some cores:
[root@prac2 cores]# ls
core.ons.4990 core.ons.5038 core.ons.5078 core.ons.5123 core.ons.5163 core.ons.5203
core.ons.5004 core.ons.5045 core.ons.5090 core.ons.5135 core.ons.5175 core.ons.5210
core.ons.5011 core.ons.5052 core.ons.5097 core.ons.5142 core.ons.5182 core.ons.5217
core.ons.5018 core.ons.5064 core.ons.5104 core.ons.5149 core.ons.5189 core.ons.5224
core.ons.5031 core.ons.5071 core.ons.5116 core.ons.5156 core.ons.5196
[root@prac2 cores]# /u01/app/crs/bin/crsctl stop crs
Stopping resources.
This could take several minutes.
Successfully stopped Oracle Clusterware resources
Stopping Cluster Synchronization Services.
Shutting down the Cluster Synchronization Services daemon.
Shutdown request successfully issued.
[root@prac2 cores]#

We must find, if that ONS daemon dumps core each time at the same address (rip = instruction pointer), we’ll take core from PID 4990:
[root@prac2 cores]# grep 'ons\[4990\]' /var/log/messages
Jan 25 20:07:12 prac2 kernel: ons[4990]: segfault at 0000000000000060 rip 000000000040cb56 rsp 0000000043204028 error 4
Jan 25 20:16:49 prac2 kernel: ons[4990]: segfault at 0000000000000060 rip 000000000040cb56 rsp 0000000043204028 error 4
Jan 25 20:34:13 prac2 kernel: ons[4990]: segfault at 0000000000000060 rip 000000000040cb56 rsp 0000000043204028 error 4
[root@prac2 cores]#

OK, so we’ve got bad guy at 0x000000000040cb56. Now we have to know why it happens (we want to find the problem, only if we know exact reason, we’are going to fix it — that’s my personal philosophy). Some output has been trimmed for clarity.
[root@prac2 cores]# file core.ons.4990
core.ons.4990: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'ons'
[root@prac2 cores]# gdb /u01/app/crs/opmn/bin/ons core.ons.4990
GNU gdb Red Hat Linux (6.5-25.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
[..]

Core was generated by `/u01/app/oracle/product/11.1.0/crs/opmn/bin/ons -d'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000040cb56 in opmnHttpFormatConnect ()
(gdb) thread apply all bt

Thread 7 (process 4990):
#0 0x0000003ea6c0a496 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x000000000041aa07 in main ()

Thread 6 (process 4998):
#0 0x0000003ea6c0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000418666 in opmnWorkerEntry ()
#2 0x0000003ea6c062e7 in start_thread () from /lib64/libpthread.so.0
#3 0x0000003ea60ce3bd in clone () from /lib64/libc.so.6

Thread 5 (process 4999):
#0 0x0000003ea6c0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000418666 in opmnWorkerEntry ()
#2 0x0000003ea6c062e7 in start_thread () from /lib64/libpthread.so.0
#3 0x0000003ea60ce3bd in clone () from /lib64/libc.so.6

Thread 4 (process 5000):
#0 0x0000003ea6c0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000418666 in opmnWorkerEntry ()
#2 0x0000003ea6c062e7 in start_thread () from /lib64/libpthread.so.0
#3 0x0000003ea60ce3bd in clone () from /lib64/libc.so.6

---Type to continue, or q to quit---
Thread 3 (process 5001):
#0 0x0000003ea60c7922 in select () from /lib64/libc.so.6
#1 0x0000000000411399 in opmnListenerEntry ()
#2 0x0000003ea6c062e7 in start_thread () from /lib64/libpthread.so.0
#3 0x0000003ea60ce3bd in clone () from /lib64/libc.so.6

Thread 2 (process 5003):
#0 0x0000003ea6095451 in nanosleep () from /lib64/libc.so.6
#1 0x0000003ea6095274 in sleep () from /lib64/libc.so.6
#2 0x0000000000413fb6 in opmnPerformance ()
#3 0x0000003ea6c062e7 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003ea60ce3bd in clone () from /lib64/libc.so.6

Thread 1 (process 5002):
#0 0x000000000040cb56 in opmnHttpFormatConnect ()
#1 0x0000000000408d19 in connectionActive ()
#2 0x0000000000407774 in opmnConnectionEntry ()
#3 0x0000003ea6c062e7 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003ea60ce3bd in clone () from /lib64/libc.so.6

(gdb)
(gdb) disassemble
Dump of assembler code for function opmnHttpFormatConnect:
[..]
0x000000000040cb4b : mov 2121046(%rip),%r11 # 0x6128a8
0x000000000040cb52 : pushq 0x60(%r11)
0x000000000040cb56 : movzwl 0x60(%rbp),%r9d
[..]
0x000000000040cba6 : lea 826995(%rip),%rsi # 0x4d6a20 <__STRING.28>
0x000000000040cbad : mov 0x78(%rsp),%rdi
0x000000000040cbb2 : callq 0x4073d0
[..]
0x000000000040cbf5 : retq
[..]
(gdb)

From the output we can conclude that this function is not calling other functions/syscalls except sprintf. The instruction that SIGSEGVs program movzwl is from familiy of “extended move”. It’s just optimised copy instruction. Something is wrong – probabbly we are loading from bad frame base pointer here? Bad memory pointer? NULL? Also format string for sprintf (for more info: man sprintf) is as follows (loaded by lea instruction):

(gdb) x/s 0x4d6a20
0x4d6a20 <__STRING.28>: "POST /connect HTTP/1.1\r\nVersion: 3\r\nContent-Length: 0\r\nONStarget: %u,%hu\r\nONSsource: %u,%hu,%hu,%hu\r\nONSinfo: %s!%s!%u!%u!%hu!%hu!%hu\r\nhostName: %s\r\nclusterId: %s\r\nclusterName: %s\r\ninstanceId: %s\r\nins"...
(gdb)

As Oracle’s documentation specified, ONS – by default – takes parameters from OCR. We can dump OCR for further analysis:
[root@prac2 cores]# /u01/app/crs/bin/ocrdump
[root@prac2 cores]#

This generates human readable file named OCRDUMPFILE. After viewing it (greping for ONS string) it appears that configuration for prac2 node is not there (and it should be). This can be corrected on prac1 node (where Clusterware is up):

[root@prac1 tmp]# /u01/app/crs/bin/racgons add_config prac2:6251
[root@prac1 tmp]# /u01/app/crs/bin/ocrdump
[root@prac1 tmp]# grep ONS OCRDUMPFILE
[DATABASE.ASM.prac1.+asm1.START_OPTIONS]
[DATABASE.ONS_HOSTS]
[DATABASE.ONS_HOSTS.prac1]
[DATABASE.ONS_HOSTS.prac1.PORT]
[DATABASE.ONS_HOSTS.prac2]
[DATABASE.ONS_HOSTS.prac2.PORT]
ORATEXT : CRS application for ONS on node
ORATEXT : CRS application for ONS on node
[root@prac1 tmp]#

Now we corrected it. Let’s start Clusterware on prac2 and verify how that format function performs. This somehow tricky, because main ONS daemon is child of parent ONS watchdog (21165), that function is being called only while initialising, so we’re going to kill child and follow code-path to the child.. :

[oracle@prac2 tmp]$ ps uaxw|grep opm
oracle 21165 0.0 0.0 15928 392 ? Ss 21:55 0:00 /u01/app/oracle/product/11.1.0/crs/opmn/bin/ons -d
oracle 21166 2.1 0.6 146920 10984 ? Sl 21:55 0:00 /u01/app/oracle/product/11.1.0/crs/opmn/bin/ons -d
oracle 21281 0.0 0.0 61112 700 pts/0 R+ 21:55 0:00 grep opm
[oracle@prac2 tmp]$ gdb -p 21165 # parent
GNU gdb Red Hat Linux (6.5-25.el5rh)
[..]
(gdb) set follow-fork-mode child
(gdb) b *opmnHttpFormatConnect
Breakpoint 1 at 0x40c9f2
(gdb) b *0x000000000040cbf5 # remember that retq address...?
Breakpoint 2 at 0x40cbf5
(gdb) cont
Continuing.

Now to the the other session, where we kill child (parent will restart it):
[oracle@prac2 tmp]$ kill -TERM 21166

On the GDB session we'll get this:
[New process 22004]
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[New LWP 22015]
[Switching to LWP 22015]

Breakpoint 1, 0x000000000040c9f2 in opmnHttpFormatConnect ()
(gdb) cont
Continuing.

Breakpoint 2, 0x000000000040cbf5 in opmnHttpFormatConnect ()

(gdb) info registers rax
rax 0x1ab38960 447973728
(gdb) x/s 0x1ab38960
0x1ab38960: "POST /connect HTTP/1.1\r\nVersion: 3\r\nContent-Length: 0\r\nONStarget: 174260502,6251\r\nONSsource: 174260501,6251,6150,0\r\nONSinfo: !!174260501!0!6251!0!6150\r\nhostName: prac1\r\nclusterId: databaseClusterId\r\nc"...
(gdb) quit
The program is running. Exit anyway? (y or n) y
[oracle@prac1 prac1]$

Summary: at the breakpoint #2 we are at retq (return from subroutine) and we get in rax (64-bit) register address of memory which contains formated string. This is HTTP request that is sent from child ONS program. We’ve verified that it works correctly :)

Quick memo for myself for starting up GridControl

Saturday, January 24th, 2009

Starting GC (as oracle software owner):

$OMA_HOME/bin/emctl start agent
$OMS_HOME/bin/emctl start oms
$OMS_HOME/opmn/bin/opmnctl startall

.. and stopping:

$OMS_HOME/opmn/bin/opmnctl stopall
$OMA_HOME/bin/emctl stop agent
$OMS_HOME/bin/emctl stop oms

$ORACLE_HOME/bin/sqlplus /nolog <<EOF1
connect / as sysdba
shutdown immediate
EOF1

Brief description of work on my MAA challenge lab… #1

Saturday, January 24th, 2009

This picture below (click to enlarge) shows what I’m planning to deploy in my home lab in order to prepare better for OCP certification. It can be summarized as full Maximum Availbility Architecture implementation… Grid Control is being used to increase productivity, but I don’t want to integrate Oracle VM into the stack, just systems and databases:

MAA challenge lab

17/01/2008: Installation and fight with Grid Control on VM gc. Preparing VM linux template named 01_prac1 from which other machines are going to be cloned (simple as recursive “cp” in dom0).

18/01/2008: Installation & fight with Grid Control after I’ve dropped “emrep” database (main GC repository database). This happened while I was playing with cloned database “misc1″ from “emrep”. I didn’t read message while running “DROP DATABASE ..”  from RMAN and I’ve sent both to /dev/null, because the DBID was the same for the orginal one and the “misc1″ clone.  The primary reason was that I wanted misc1 cloned from emrep but it failed :) ). Did I say that I’ve also deleted backups? ;) After new, sucessfull fresh installation of GC, I’ve taken full backup (from XEN dom0) of 01_gc VM for future purposes. I’m starting to regret that I’ve haven’t used LVM in dom0 for snapshot/fast-backup purposes…

20/01/2008: Setting up 09_ora1 VirtualMachine from VM template 01_prac1. Installing single 11g database named “db1.lab1″ with dedicated tablespace & user for sysbench version 0.4.8 (0.4.10 doesn’t work with Oracle).

23/01/2008: Cloning ora2 from ora1. Changing hostnames, IPs (trick: the same ethernet MACs but on different XEN bridges, changes performed from console:)). Uninstalling Agent10g, vanishing db on ora2. Setting up DNS server on quadvm (in XEN dom0) for whole lab. Installing GC agent on ora2 – something is wrong… GC console doesn’t catch up new targets (primary I’m looking for “Host” component). Agent is itself discovered by GC…. starting from the beginning (rm -rf 08_ora2 :o ) and so on…
Finally got ora3 up running instead of ora2. Then it turned out that Laurent Schneider has blogged post about metalink note in which the procedure of agent removal is described. So finally I’ve got ora2 back in GC (with gc, ora1 and ora3).

Next step was setting up host prac1 as for single instance non-RAC ASM database “db3.lab1″. First  clusterware has been installed. I wanted to have it 32-bit version, because my future standby RAC hardware is only 32-bit capable but it appears that I would have to install 32-bit userland RPMs, so I decided to try in the long term 64-bit primary RAC with 32-bit standby RAC… Also Grid Control agent was installed on prac1.

24/01/2008: Raised RAM for 01_prac1 to 1.6GB from 1.2GB (due to frequent swapping occuring, I want my 11g memory_target for that fresh db3.lab1 database to be left at 450MB). Succesfull migration of ASM storage /dev/hde to /dev/hdi (simulating storage migration – thanks to ASM it is straightforward. I’m going to blog about it in my next post).

Oracle Database Vault, not so 0-day anymore, privilege escalation using ptrace(2) from UNIX account

Tuesday, November 18th, 2008

It seems, that there are many misunderstandings surrounding Database Vault (Oracle product for protecting sensitive data from company employees – such like *credit card* records and other very sensitve financial data). Oracle’s marketing tried to always claim that is product is able to protect data from administrators(!), which of course is not true. Let’s take the following excerpt from Database Vault whitepapper:

“Privileged users can be prevented from access application data and separation-of-duty can be enforced across existing database administrators without a costly and time consuming least privilege exercise.”

Of course you could assume (as I did) here that DV protects against SYSDBA role too. That’s why this ora_dv_mem_off.c was spawn. After contacting Oracle secalert (greetz to them for discussions) in Februrary/March this year, it is clear that without in-depth reading of “Appendix C” from official DV documentation you won’t get full picture of the solution. DV was not to designed to protect from OS side – that’s the main technical point here – any database is still open for attack from OS side even with DV. And SYSDBA seems to be disabled from the OS side: that’s correct, you won’t be able to perform “sqlplus / as sysdba” even as Oracle software owner, you can as: “/ as sysoper” at most. In order to perform administrative tasks you require downtime (to relink). So any SYSDBA logged on UNX software owner account could defeat DV (and gain access to sensitive data) but this would be easily spotted. Here’s another solution, disabling DV on runtime. So enjoy, for free this time!

And oh, there is theoretical possibility that in future Oracle DV would run under several different OS-user/uids processes, and thus would be able to protect from SYSDBA’s too, but this would need MAJOR rearchitecture.

QuickNote to the buisness: NO, you are still not able to prevent watching cash flows by Database Admins ;)

Typical escalation (allowing to login in as SYSDBA and allowing to create users – thus excluding Security Admin from the job;) ):

[oracle@xeno ora_dv_mem_off]$ !gcc
gcc -Wall ora_dv_mem_off.c -o ora_dv_mem_off -lbfd -liberty
ora_dv_mem_off.c: In function ‘locate_dv_func’:
ora_dv_mem_off.c:92: warning: initialization discards qualifiers from pointer
target type
ora_dv_mem_off.c:93: warning: initialization makes pointer from integer
without a cast

[oracle@xeno ora_dv_mem_off]$ ./ora_dv_mem_off
[17035] starting to trace sqlplus process (17036)
[***] NOW TYPE IN SQLPLUS: conn / as sysdba
[17035] execve() syscall in 17036

SQL*Plus: Release 10.2.0.3.0 – Production on Wed Feb 27 18:56:55 2008

Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.

SQL> conn / as sysdba
[17035] clone() syscall in 17036, tracing orapid=17037
[17035] execve() syscall in 17037,
[17035] symbol “kzvtins” at 0xb185820
[***] sucessfuly validated function, DatabaseVault=1
[***] attempting to rewrite memory at 0xb185824
Connected.
SQL> create user god identified by abc;

User created.

SQL> grant dba,dv_admin,dv_owner,connect,resource to god;

Grant succeeded.

SQL>