Tuesday, February 17, 2009

Manually switch over to the standby database

Manually switch over of dataguard configuration without dataguard broker

1. Verify the Physical Standby Database Is Performing Properly
Step 1 Identify the existing archived redo log files on strandby

SQL>SELECT SEQUENCE#, FIRST_TIME, NEXT_TIME FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;

Step 2 Force a log switch to archive the current online redo log file on primary
SQL>ALTER SYSTEM SWITCH LOGFILE;

Step 3 Verify the new redo data was archived on the standby database.
SQL>SELECT SEQUENCE#, FIRST_TIME, NEXT_TIME FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;

Step 4 Verify that received redo has been applied on standby
SQL>SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;

The value of the APPLIED column for the most recently received log file will be
either IN-MEMORY or YES if that log file has been applied.

2. Pre-Switchover Checks
Verify Managed Recovery is Running (non-broker) on the standby
SQL>select process from v$managed_standby where process like 'MRP%';

Cancel apply delay for the target standby using SQL
SQL>select delay_mins from v$managed_standby where process = 'MRP0';

if delay_mins >0 then
SQL>RECOVER MANAGED STANDBY DATABASE CANCEL
SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY USING CURRENT LOGFILE DISCONNECT FROM SESSION;

Verify there are no large GAPS.
On primary
SQL>SELECT THREAD#, SEQUENCE# FROM V$THREAD;
On the standby the following query should be no more than 1-2 less than the primary query result
SQL>SELECT THREAD#, MAX(SEQUENCE#)
FROM V$ARCHIVED_LOG val, V$DATABASE vdb
WHERE APPLIED = 'YES' AND val.RESETLOGS_CHANGE#=vdb.RESETLOGS_CHANGE#GROUP BY THREAD#;
If there is a gap .. fix the gap

Verify Primary and Standby TEMP Files Match
SQL>select tmp.name FileName, bytes, ts.name Tablespace from v$tempfile tmp, v$tablespace ts where tmp.ts#=ts.ts#;
If there is no match then you will correct the temp tbs problem after switchover

Verify that there is no issue with V$LOG_HISTORY on the Standby (bug 6010833)
1. Get the Primary RESETLOGS_CHANGE#. On the primary execute:
SQL>select RESETLOGS_CHANGE# from V$DATABASE;

2. On the standby get the maximum sequence numbers for each thread from V$LOG_HISTORY:
SQL>select thread#, max(sequence#) from V$LOG_HISTORY where resetlogs_change#=1 group by thread#;

3. Get current sequence numbers on standby:
SQL>select thread#, max(sequence#) from v$archived_log alog, v$database db where alog.resetlogs_change#=db.resetlogs_change# group by thread#;

4. The last sequence# for each thread# from V$LOG_HISTORY should be close (the difference in log sequences < 3) to the last sequence# for each thread# from V$ARCHIVED_LOG.

5. If there is an issue with V$LOG_HISTORY then recreate the standby controlfile

Verify no old partial Standby Redo Logs on the Standby (bug 7159505)
1. Identify any active standby redo logs (SRL’s)
SQL>SELECT GROUP#, THREAD#, SEQUENCE# FROM V$STANDBY_LOG WHERE STATUS = 'ACTIVE' ORDER BY THREAD#,SEQUENCE#;

2.Identify maximum applied sequence number(s).
SQL> select thread#, max(sequence#)
from V$LOG_HISTORY
where resetlogs_change#=<>
group by thread#;

3.If there are any active SRL's that have a thread#/sequence# less than the thread#/sequence# returned from the V$LOG_HISTORY (meaning the recovery has progressed beyond the active SRL) query then clear them.
SQL>RECOVER MANAGED STANDBY DATABASE CANCEL
SQL>ALTER DATABASE CLEAR LOGFILE GROUP ;

3 Switchover
Clear Potential Blocking Parameters & Jobs
1.Capture current job state on the primary
SQL>select * from dba_jobs_running;
SQL>select owner,job_name, start_date, end_date, enabled from dba_scheduler_jobs where enabled='TRUE' and owner <> 'SYS';
SQL>show parameter job_queue_processes -- and capture the value

Block further job submission
SQL>alter system set job_queue_processes=0 scope=both;
SQL>execute dbms_scheduler.disable(job_name);

Disable any cron jobs that may interfere , rman backups
Shutdown all mid-tiers (e.g db console)
$ emctl stop dbconsole

Monitor Switchover
Turn on Data Guard tracing on primary and standby
Tracing is turned on to have diagnostic information available in case any issues arise.
SQL>show parameter log_archive_trace -- capture the value
Set Data Guard trace level to 8191
SQL>alter system set log_archive_trace=8191;
Trace output will appear under the destination pointed to by the database parameter BACKGROUND_DUMP_DEST with “mrp” in the file name.

Monitor alert log for both primary and standby

Finally Switchover
Verify that the primary database can be switched to the standby role
SQL>SELECT SWITCHOVER_STATUS FROM V$DATABASE;
A value of TO STANDBY or SESSIONS ACTIVE (requires the WITH SESSION SHUTDOWN clause on the switchover command) indicates that the primary database can be switched to the standby role

Switchover the primary to a standby database
SQL>ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN;
in primary alert log you would see

Switchover: Complete - Database shutdown required (sfs_stby1)
Completed: ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN

Verify that the standby database can be switched to the primary role
SQL>SELECT SWITCHOVER_STATUS FROM V$DATABASE;

Check if the standby has ever been open read-only
SQL> SELECT VALUE FROM V$DATAGUARD_STATS WHERE NAME='standby has been open';

If the standby was open read-only then restart the standby
SQL> SHUTDOWN IMMEDIATE
SQL> STARTUP MOUNT

Switchover the standby database to a primary
SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;

Open the new primary database:
SQL> ALTER DATABASE OPEN;
Correct any tempfile mismatch

Restart the new standby
On the the new standby database (old production database), bring it to the mount state and start managed recovery.
SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP MOUNT;
SQL> RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;

Post-Switchover Steps
For each database:
SQL>alter system set log_archive_trace=;
SQL>alter system set job_queue_processes= scope=both;
SQL>execute dbms_scheduler.enable();
Enable any cron jobs that were diabled in 3.1

Check that they are syncronized
Perfom log switch on new primary
Create cron job to backup new primary !!!


Reference
Metalink Note id : 751600.1

8 comments:

  1. Hi Agis,

    Thanks for this good document. I followed these steps, but I am getting error for redologs. I dont have redologs for standby database. Also it is looking for the path same as Primary database file system, which is different in standby database. SO what do you suggest for this. Do you want me to create redologs for standby database with same path as primary and then drop and recreate redologs in proper place? or is there any other way to do this?

    Thanks for your help.

    ReplyDelete
  2. Hi Prabhat,

    This document assumes that you have the same physical structure on the standby server as the primary.
    How did you created the standby ?
    Please post also the ora error code.

    ReplyDelete
  3. Check also this link in order to create the standby database.

    http://agstamy.blogspot.com/2008/10/oracle-10g-manually-create-physical.html

    ReplyDelete
  4. Hi Agis,

    Actually I followed below link for the physical stand by database.

    http://dbataj.blogspot.com/2007/03/standby-database.html

    Regarding ora error, i will let you know tomorrow as I don't have access to my server at this time.

    It will be very great full if you provide us the steps if the file structure is different.

    Thanks for you time.

    ReplyDelete
  5. Hi Prabhat,

    If you follow my link you will see
    DB_FILE_NAME_CONVERT,LOG_FILE_NAME_CONVERT on the standby pfile.

    This parameters tells that the standby server has different structure.
    Now concerning your redologs you must use :
    LOG_FILE_NAME_CONVERT='[NEW_PATH]',[OLD_PATH]

    Lets assume that on the primary server
    you have 2 redos on /u01/oradata/redo
    and 2 redos on /u02/oradata/redo

    Now on standby you have /u03,/u04

    You must use:
    LOG_FILE_NAME_CONVERT='/u03/oradata/redo'
    ,'/u01/oradata/redo',
    '/u04/oradata/redo'
    ,'/u02/oradata/redo'

    Put these parameters on your standby pfile and mount your standby.

    ReplyDelete
  6. Hi Agis,

    I have restored standby database from the primary which is in maximum availability. The retore has been successful and there are no gaps too.. But my problem is
    the protection_level is showing as resynchronization insted of max availablity. Any ideas?? My primary database is RAC cluster but one node is shutdown and only one node is active. and standby is a single instance.

    Many thanks..

    ReplyDelete
  7. I found something which seems anormal and I noticed that since I have a script which check all the consecutive archivelog for each thread (first_change#, last_change#) and its finds SCN gap if there is a hole.

    Before putting my RAC 2 nodes database 10.2.0.4 with dataguard, I had no gap.

    I think a do properly the switchover (using LGWR ASYNC), i don't find any error on both alert_log. The switchover seems to run well.

    But when i ran my "gap script", i found the following oddity :

    THREAD# SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
    ---------- ---------- ------------- ------------
    2 118 5966403 5966562
    2 119 5966562 5966577
    2 120 5966577 5970293
    2 121 5970293 5970310
    2 122 5977084 5977094

    There is an SCN gap between Sequence 121 and 122.

    Is this normal ? I don't think so..

    One other problem seems to happen when i try to do an incomplete recover to SCN 5970309 =>OK but until SCN 5970310 => RMAN error

    ORA-00283: recovery session canceled due to errors
    RMAN-11003: failure during parse/execution of SQL statement: alter database recover logfile

    Am I the only one ?

    This is really strange ?

    Thanks for any inputs or helps.

    ReplyDelete
  8. Hi JC,

    Can you post all the RMAN error output (I need the Ora errors after the RMAN-11003).

    Also can you try your restore with UNTIL SEQUENCE 122 and post the whole error output if any occurs.

    Agis

    ReplyDelete