Archive for December, 2007

Automatic failover with Oracle DataGuard (Fast-Start Failover)

Wednesday, December 5th, 2007

This post will demonstrate beautiful software… Oracle DataGuard :)

Quick intro for non-Oracle people out there… Oracle DataGuard is High Availability solution for Oracle Database. For thousands pages of documentation, concept guides, troubleshooting, HOWTOs about it please visit docs.oracle.com ;)

Some facts:
1) Max Availability mode (requirrement of Fail-Start Failover)
2) Flashback for database is on (also req.)
3) Physical standby
4) All configured from CLI (sqlplus and dgmgrl; no OEM)
5) Oracle version: 10gR2 EE(10.2.0.1)
6) All done on single host(2 instances + 1 observer + 1 listener)

Ok, let’s start observer (element which tests instances and decides when to failover to secondary database):

[oracle@xeno ~]$ export ORACLE_SID=xeno1
[oracle@xeno ~]$ dgmgrl
DGMGRL for Linux: Version 10.2.0.1.0 - Production
Copyright (c) 2000, 2005, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/abc
Connected.
DGMGRL> show configuration verbose;
Configuration
Name: DGxeno
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
xeno1 - Primary database
xeno3 - Physical standby database
- Fast-Start Failover target
Fast-Start Failover
Threshold: 30 seconds
Observer: xeno.localdomain
Current status for "DGxeno":
SUCCESS
DGMGRL> START OBSERVER
<-- session hangs...

From another terminal we will insert some data:

[oracle@xeno ~]$ echo $ORACLE_SID
xeno1
[oracle@xeno ~]$ sqlplus vnull/abc
SQL*Plus: Release 10.2.0.1.0 - Production on Tue Dec 4 18:48:39 2007
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning and Data Mining options
SQL> insert into dgtest values ('30000000');
1 row created.
SQL> commit;
Commit complete.
SQL>

Great, now we are going to prepare small holocaust for our xeno1 database… we just simply instant SIGKILL all processes of oracle xeno1:

[root@xeno ~]$ for P in `ps auxw | awk '/ora_[0-9a-z]+_xeno1/ { print $2 }' `; do kill -9 $P; done

Simple and efficient! ;)

Screenshot from DGMGRL observer:
DGMGRL observer failover

Now we can see from alert_xeno3.log the following:

[..]
<-- simulated failure of xeno1!
Tue Dec 4 19:01:19 2007
RFS[14]: Possible network disconnect with primary database
Tue Dec 4 19:01:19 2007
RFS[15]: Possible network disconnect with primary database
Tue Dec 4 19:01:19 2007
RFS[13]: Possible network disconnect with primary database
<-- failover starts!
Tue Dec 4 19:02:45 2007
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE
Tue Dec 4 19:02:45 2007
Terminal Recovery: Stopping real time apply
Tue Dec 4 19:02:45 2007
MRP0: Background Media Recovery cancelled with status 16037
Tue Dec 4 19:02:45 2007
Errors in file /u01/app/oracle/admin/xeno3/bdump/xeno3_mrp0_7206.trc:
ORA-16037: user requested cancel of managed recovery operation
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 2316040
Tue Dec 4 19:02:45 2007
Errors in file /u01/app/oracle/admin/xeno3/bdump/xeno3_mrp0_7206.trc:
ORA-16037: user requested cancel of managed recovery operation
Tue Dec 4 19:02:45 2007
MRP0: Background Media Recovery process shutdown (xeno3)
Tue Dec 4 19:02:46 2007
Terminal Recovery: Stopped real time apply
Tue Dec 4 19:02:46 2007
Attempt to do a Terminal Recovery (xeno3)
Tue Dec 4 19:02:46 2007
Media Recovery Start: Managed Standby Recovery (xeno3)
Managed Standby Recovery not using Real Time Apply
Terminal Recovery timestamp is '12/04/2007 19:02:46'
Terminal Recovery: applying standby redo logs.
Terminal Recovery: thread 1 seq# 95 redo required
Terminal Recovery: /u04/oracle/xeno3/standby_redo01.log
Identified End-Of-Redo for thread 1 sequence 95
Tue Dec 4 19:02:46 2007
Incomplete recovery applied all redo ever generated.
Recovery completed through change 2316041
Tue Dec 4 19:02:46 2007
Media Recovery Complete (xeno3)
Terminal Recovery: successful completion
Begin: Standby Redo Logfile archival
End: Standby Redo Logfile archival
Resetting standby activation ID 3036789101 (0xb501b96d)
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE
Tue Dec 4 19:02:51 2007
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN
Tue Dec 4 19:02:51 2007
ALTER DATABASE SWITCHOVER TO PRIMARY (xeno3)
If media recovery active, switchover will wait 900 seconds
Standby terminal recovery start SCN: 2316040
SwitchOver after complete recovery through change 2316041
Online log /u04/oracle/xeno3/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u04/oracle/xeno3/redo02.log: Thread 1 Group 2 was previously cleared
Standby became primary SCN: 2316039
Converting standby mount to primary mount.
Tue Dec 4 19:02:51 2007
Switchover: Complete - Database mounted as primary (xeno3)
Completed: ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN
Tue Dec 4 19:02:51 2007
ARC6: STARTING ARCH PROCESSES
Tue Dec 4 19:02:51 2007
ALTER SYSTEM SET standby_archive_dest='' SCOPE=BOTH SID='xeno3';
Tue Dec 4 19:02:51 2007
ARC7: Becoming the 'no SRL' ARCH
Tue Dec 4 19:02:51 2007
ALTER SYSTEM SET log_archive_dest_1='location="/u04/oracle/xeno3/arch"','valid_for=(ONLINE_LOGFILE,ALL_ROLES)' SCOPE=BOTH SID='xeno3';
Tue Dec 4 19:02:51 2007
ALTER SYSTEM SET log_archive_dest_state_1='ENABLE' SCOPE=BOTH SID='xeno3';
Tue Dec 4 19:02:51 2007
ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY
Tue Dec 4 19:02:51 2007
Completed: ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY
Tue Dec 4 19:02:51 2007
ALTER DATABASE OPEN
Tue Dec 4 19:02:51 2007
Assigning activation ID 3036795877 (0xb501d3e5)
LGWR: Primary database is in MAXIMUM AVAILABILITY mode
Tue Dec 4 19:02:51 2007
Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
Tue Dec 4 19:02:51 2007
ARCa: Archival started
ARC6: STARTING ARCH PROCESSES COMPLETE
ARCa started with pid=13, OS id=7969
LNSb started with pid=29, OS id=7971
Error 12514 received logging on to the standby
Tue Dec 4 19:02:58 2007
Errors in file /u01/app/oracle/admin/xeno3/bdump/xeno3_lgwr_6565.trc:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Tue Dec 4 19:02:58 2007
LGWR: Error 12514 verifying archivelog destination LOG_ARCHIVE_DEST_2
Tue Dec 4 19:02:58 2007
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Continuing...
Thread 1 advanced to log sequence 97
LGWR: Waiting for ORLs to be archived...
LGWR: ORLs successfully archived
Thread 1 opened at log sequence 97
Current log# 2 seq# 97 mem# 0: /u04/oracle/xeno3/redo02.log
Successful open of redo thread 1
Tue Dec 4 19:03:01 2007
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Dec 4 19:03:01 2007
SMON: enabling cache recovery
Tue Dec 4 19:03:02 2007
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
Dictionary check complete
Tue Dec 4 19:03:02 2007
SMON: enabling tx recovery
Tue Dec 4 19:03:02 2007
Database Characterset is WE8ISO8859P1
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=29, OS id=7973
Tue Dec 4 19:03:03 2007
LOGSTDBY: Validating controlfile with logical metadata
Tue Dec 4 19:03:03 2007
LOGSTDBY: Validation complete
Tue Dec 4 19:03:04 2007
Completed: ALTER DATABASE OPEN
[..]


[oracle@xeno ~]$ export ORACLE_SID=xeno3
[oracle@xeno ~]$ dgmgrl
DGMGRL for Linux: Version 10.2.0.1.0 - Production
Copyright (c) 2000, 2005, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/abc
Connected.
DGMGRL> show configuration;
Configuration
Name: DGxeno
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
xeno1 - Physical standby database (disabled)
- Fast-Start Failover target
xeno3 - Primary database
Current status for "DGxeno":
Warning: ORA-16608: one or more databases have warnings
DGMGRL> show database xeno3;
Database
Name: xeno3
Role: PRIMARY
Enabled: YES
Intended State: ONLINE
Instance(s):
xeno3
Current status for "xeno3":
Warning: ORA-16817: unsynchronized Fast-Start Failover configuration
DGMGRL> show database xeno1;
Database
Name: xeno1
Role: PHYSICAL STANDBY
Enabled: NO
Intended State: ONLINE
Instance(s):
xeno1
Current status for "xeno1":
Error: ORA-16661: the standby database needs to be reinstated
DGMGRL>

We should check now our data:

[oracle@xeno ~]$ export ORACLE_SID=xeno3
[oracle@xeno ~]$ sqlplus vnull/abc
SQL*Plus: Release 10.2.0.1.0 - Production on Tue Dec 4 19:16:17 2007
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning and Data Mining options
SQL> select * from dgtest;
ID
----------
[..]
777
30000000 <--- our data
[..]

DataGuard saved the day :)