本文共 12034 字,大约阅读时间需要 40 分钟。
由于希望测试一个DG问题,所以在Vmware上搭建测试环境,primary和standby放在同一台主机上。
搭建过程就不赘述,但是在最后,遇到了一些问题,这里记录一下,以备后续查阅。----搭建过程中的一些命令-----
1 2 3 4 5 6 7 | 1 . backup database format '/tmp/bk_%U' ; 2 . backup current controlfile for standby format '/tmp/stdbyctl.bkp' ; 3 . catalog start with '/tmp/' ; 4 . set controlfile autobackup format for device type disk to '/tmp/%F' ; restore standby controlfile from '/tmp/stdbyctl.bkp' ; 5 . RMAN> CONNECT TARGET SYS/oracle@db; RMAN> CONNECT AUXILIARY SYS/oracle@stddb; |
6. 最会std的pfile文件内容
1 2 3 4 5 6 7 8 9 10 11 12 13 | db_file_name_convert= '+data/db/' , '+reco/stddb/' log_file_name_convert= '+RECO/db/archivelog/' , '+RECO/stddb/archivelog_std/' log_archive_format=%t_%s_%r.arc standby_file_management=auto compatible= '11.2.0.0.0' control_files= '+RECO/stddb/controlfile/current.260.834947597' <<<<<<在恢复control之前,先设定为control_files= '+RECO' 恢复之后,根据实际位置更改 db_block_size= 8192 processes= 150 remote_login_passwordfile= 'EXCLUSIVE' undo_tablespace= 'UNDOTBS1' undo_management=auto log_archive_config= 'dg_config=(primdb,stbdb)' log_archive_dest_1= 'location=+RECO/stddb/archivelog_std/ valid_for=(all_logfiles,all_roles) db_unique_name=stbdb' |
问题一:
在设置listener的过程中,由于设定静态监听,在standby启动之后,同时又会注册一个动态监听,目前有两个监听,其中动态监听状态为blocked。
遇到的问题就是,在duplicate连接的时候,就会发生ORA-12528错误,而不能正常连接到standby。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [oracle@OEL ~]$ lsnrctl status LSNRCTL for Linux: Version 11.2 . 0.1 . 0 - Production on 30 -DEC- 2013 10 : 46 : 03 Copyright (c) 1991 , 2009 , Oracle. All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT= 1521 )) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.2 . 0.1 . 0 - Production Start Date 27 -NOV- 2013 12 : 40 : 26 Uptime 32 days 22 hr. 5 min. 37 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/ 11.2 . 0 /grid/network/admin/listener.ora Listener Log File /u01/app/oracle/diag/tnslsnr/OEL/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=OEL.localdomain)(PORT= 1521 ))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM" , status READY, has 1 handler(s) for this service... Service "PRIMDB" has 1 instance(s). Instance "db" , status READY, has 1 handler(s) for this service... Service "dbXDB" has 1 instance(s). Instance "db" , status READY, has 1 handler(s) for this service... Service "stbdb" has 1 instance(s). Instance "stbdb" , status UNKNOWN, has 1 handler(s) for this service... <<<<<<<<<<<<<<<<<<<<<< Service "stddb" has 1 instance(s). Instance "stddb" , status READY, has 1 handler(s) for this service... <<<<<<<<<<<<<<<<<<<<<< The command completed successfully ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
通过设置tnsnames.ora,可以解决这个问题。
1 2 3 4 5 6 7 8 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ stddb = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = OEL.localdomain)(PORT = 1521 )) ) (CONNECT_DATA = (SERVICE_NAME = stddb)(UR=A)) <--------------------In order to avoid error ORA- 12528 ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
问题二:
在执行下面命令之后,直接primary就crash掉,经过查询alert发现,是redo broken导致的
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;查询standby alert,发现如下信息:
(在ALTER DATABASE RECOVER...开始之后,第一件事情就是clear redo log,这样就直接导致primary crash) 1 2 3 4 5 6 7 8 9 10 11 12 13 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # tail -f alert_stddb.log Clearing online redo logfile 1 complete Clearing online redo logfile 2 +DATA/db/redo02.log Clearing online log 2 of thread 1 sequence number 5 Clearing online redo logfile 2 complete Clearing online redo logfile 3 +DATA/db/redo03.log Clearing online log 3 of thread 1 sequence number 3 Tue Dec 24 14 : 32 : 49 2013 Clearing online redo logfile 3 complete Tue Dec 24 14 : 32 : 49 2013 Media Recovery Waiting for thread 1 sequence 4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
在重建几次之后,问题依然。
由于datafile和archivelog都使用convert参数转换了,但是redo是无法转换的,如何解决这个问题呢? 经过分析,发现在duplicate的最后,有一些警告信息: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | ~~~~~~~~~duplicate the standby databsae~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RMAN> CONNECT TARGET SYS/oracle@db; RMAN> CONNECT AUXILIARY SYS/oracle@stddb; connected to auxiliary database: DB (not mounted) RMAN> duplicate target database for standby; Starting Duplicate Db at 24 -DEC- 13 allocated channel: ORA_AUX_DISK_1 channel ORA_AUX_DISK_1: SID= 13 device type=DISK contents of Memory Script: { restore clone standby controlfile; } executing Memory Script Starting restore at 24 -DEC- 13 using channel ORA_AUX_DISK_1 channel ORA_AUX_DISK_1: starting datafile backup set restore channel ORA_AUX_DISK_1: restoring control file channel ORA_AUX_DISK_1: reading from backup piece /tmp/stdbyctl.bkp channel ORA_AUX_DISK_1: piece handle=/tmp/stdbyctl.bkp tag=TAG20131224T133449 channel ORA_AUX_DISK_1: restored backup piece 1 channel ORA_AUX_DISK_1: restore complete, elapsed time: 00 : 00 : 08 output file name=+RECO/stddb/controlfile/current. 259.835018533 Finished restore at 24 -DEC- 13 contents of Memory Script: { sql clone 'alter database mount standby database' ; } executing Memory Script sql statement: alter database mount standby database contents of Memory Script: { set newname for tempfile 1 to "+RECO/stddb/temp01.dbf" ; switch clone tempfile all; set newname for datafile 1 to "+RECO/stddb/system01.dbf" ; set newname for datafile 2 to "+RECO/stddb/sysaux01.dbf" ; set newname for datafile 3 to "+RECO/stddb/undotbs01.dbf" ; set newname for datafile 4 to "+RECO/stddb/users01.dbf" ; restore clone database ; } executing Memory Script executing command: SET NEWNAME renamed tempfile 1 to +RECO/stddb/temp01.dbf in control file executing command: SET NEWNAME executing command: SET NEWNAME executing command: SET NEWNAME executing command: SET NEWNAME Starting restore at 24 -DEC- 13 using channel ORA_AUX_DISK_1 channel ORA_AUX_DISK_1: starting datafile backup set restore channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set channel ORA_AUX_DISK_1: restoring datafile 00001 to +RECO/stddb/system01.dbf channel ORA_AUX_DISK_1: restoring datafile 00002 to +RECO/stddb/sysaux01.dbf channel ORA_AUX_DISK_1: restoring datafile 00003 to +RECO/stddb/undotbs01.dbf channel ORA_AUX_DISK_1: restoring datafile 00004 to +RECO/stddb/users01.dbf channel ORA_AUX_DISK_1: reading from backup piece /tmp/bk_01osanei_1_1 channel ORA_AUX_DISK_1: piece handle=/tmp/bk_01osanei_1_1 tag=TAG20131224T132953 channel ORA_AUX_DISK_1: restored backup piece 1 channel ORA_AUX_DISK_1: restore complete, elapsed time: 00 : 04 : 27 Finished restore at 24 -DEC- 13 contents of Memory Script: { switch clone datafile all; } executing Memory Script datafile 1 switched to datafile copy input datafile copy RECID= 1 STAMP= 835019033 file name=+RECO/stddb/system01.dbf datafile 2 switched to datafile copy input datafile copy RECID= 2 STAMP= 835019033 file name=+RECO/stddb/sysaux01.dbf datafile 3 switched to datafile copy input datafile copy RECID= 3 STAMP= 835019033 file name=+RECO/stddb/undotbs01.dbf datafile 4 switched to datafile copy input datafile copy RECID= 4 STAMP= 835019033 file name=+RECO/stddb/users01.dbf ORACLE error from auxiliary database: ORA- 01511 : error in renaming log/data files <<<<<<<<<<<<<<<<<<<<<<<< ORA- 01275 : Operation RENAME is not allowed if standby file management is automatic. <<<<<<<<<<<<<<<<<<<<<<<< RMAN- 05535 : WARNING: All redo log files were not defined properly. ORACLE error from auxiliary database: ORA- 01511 : error in renaming log/data files ORA- 01275 : Operation RENAME is not allowed if standby file management is automatic. RMAN- 05535 : WARNING: All redo log files were not defined properly. ORACLE error from auxiliary database: ORA- 01511 : error in renaming log/data files ORA- 01275 : Operation RENAME is not allowed if standby file management is automatic. RMAN- 05535 : WARNING: All redo log files were not defined properly. Finished Duplicate Db at 24 -DEC- 13 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
后手动修改当standby_file_management为MANUAL模式,可以duplicate并且rename成功,并rename正broken0~2.
(这个是我在pfile中有这个standby_file_management=auto) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | checkpoint is 977553 last deallocation scn is 959057 Undo Optimization current scn is 974128 Tue Dec 24 15 : 16 : 57 2013 Switch of datafile 1 complete to datafile copy checkpoint is 977553 Switch of datafile 2 complete to datafile copy checkpoint is 977553 Switch of datafile 3 complete to datafile copy checkpoint is 977553 Switch of datafile 4 complete to datafile copy checkpoint is 977553 alter database rename file '+DATA/db/redo01.log' to 'broken0' Completed: alter database rename file '+DATA/db/redo01.log' to 'broken0' <<<<<<<< alter database rename file '+DATA/db/redo02.log' to 'broken1' Completed: alter database rename file '+DATA/db/redo02.log' to 'broken1' alter database rename file '+DATA/db/redo03.log' to 'broken2' Completed: alter database rename file '+DATA/db/redo03.log' to 'broken2' RFS connections are allowed |
在执行ALTER DATABASE RECOVER...的时候,依然需要去清空redo,只不过,应经将redo rename了,所以不会影响主库的redo log了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | ~~~~~~~~~~~~~~~~~~~~~ Tue Dec 24 15 : 45 : 37 2013 ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION Attempt to start background Managed Standby Recovery process (stddb) Tue Dec 24 15 : 45 : 37 2013 MRP0 started with pid= 21 , OS id= 21343 MRP0: Background Managed Standby Recovery process started (stddb) started logmerger process Tue Dec 24 15 : 45 : 43 2013 Managed Standby Recovery not using Real Time Apply Parallel Media Recovery started with 2 slaves Waiting for all non-current ORLs to be archived... All non-current ORLs have been archived. Errors in file /u01/app/oracle/product/ 11.2 . 0 /dbhome_1/log/diag/rdbms/stddb/stddb/ trace /stddb_mrp0_21343.trc: ORA- 00313 : open failed for members of log group 1 of thread 1 ORA- 00312 : online log 1 thread 1 : '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/broken0' <<<<<<<<<<<<<<<< ORA- 27037 : unable to obtain file status Linux-x86_64 Error: 2 : No such file or directory Additional information: 3 ...... Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION Clearing online redo logfile 1 complete Errors in file /u01/app/oracle/product/ 11.2 . 0 /dbhome_1/log/diag/rdbms/stddb/stddb/ trace /stddb_mrp0_21343.trc: ORA- 00313 : open failed for members of log group 2 of thread 1 ORA- 00312 : online log 2 thread 1 : '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/broken1' ORA- 27037 : unable to obtain file status Linux-x86_64 Error: 2 : No such file or directory Additional information: 3 ...... Tue Dec 24 15 : 45 : 48 2013 Clearing online redo logfile 2 complete Errors in file /u01/app/oracle/product/ 11.2 . 0 /dbhome_1/log/diag/rdbms/stddb/stddb/ trace /stddb_mrp0_21343.trc: ORA- 00313 : open failed for members of log group 3 of thread 1 ORA- 00312 : online log 3 thread 1 : '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/broken2' ORA- 27037 : unable to obtain file status Linux-x86_64 Error: 2 : No such file or directory Additional information: 3 ...... Clearing online redo logfile 3 complete Media Recovery Waiting for thread 1 sequence 4 |
总结:由于之前的DG搭建,都是在两个不同的主机上实施,也没有关心过这个问题,因为standby清空primary redo的动作一定是失败的。但是现在是在一个主机上,standby能发现原primary的redo,并且由于rename失败,所以就直接clear primary redo导致primary 无法启动。
而root cause就是我在编辑pfile的时候,多写了standby_file_management=auto,其实不写的话,默认值是manual的,就不会出现这个问题。