当前位置导航:炫浪网>>网络学院>>编程开发>>Oracle教程

一次standby open read only 故障的处理过程

昨天一备份数据库,在open  read only的过程中出现故障,从而导致在standby上备份失败。alert log 中信息如下:

 

Errors in file /opt/oracle/admin/crmhz/udump/crmhz_ora_13914.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-16000: database open for read-only access

Error 604 happened during db open, shutting down database

USER: terminating instance due to error 604

Instance terminated by USER, pid = 13914

ORA-1092 signalled during: alter database open read only...

Tue Aug 10 09:31:51 2004

 

再查看trace file信息

 

[root@crmsb bdump]# more?? /opt/oracle/admin/crmhz/udump/crmhz_ora_13914.trc

/opt/oracle/admin/crmhz/udump/crmhz_ora_13914.trc

Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production

With the Partitioning, OLAP and Oracle Data Mining options

JServer Release 9.2.0.1.0 - Production

ORACLE_HOME = /opt/oracle/product/9.2.0

System name:??? Linux

Node name:????? crmsb

Release:??????? 2.4.9-e.3smp

Version:??????? #1 SMP Fri May 3 16:48:54 EDT 2002

Machine:??????? i686

Instance name: crmhz

Redo thread mounted by this instance: 1

Oracle process number: 11

Unix process pid: 13914, image: oracle@crmsb (TNS V1-V3)

 

*** SESSION ID:(10.3) 2004-08-10 09:30:37.408

*** 2004-08-10 09:30:37.408

Managed Recovery: Cancel posted.

ORA-00604: error occurred at recursive SQL level 1

ORA-16000: database open for read-only access

 

都没有提供什么有价值的信息,很奇怪会出现604错误

 

于是决定trace? open过程

 

startup  nomount

 

alter database mount standby? database;

 

select  sid from v$mystat where rownum = 1;

 

select  sid,serial# from v$session where sid = ?;

 

exec  dbms_system.set_ev(sid,serial#,10046,12,'');

 

alter database  open  read  only;

 

这是数据库crash ,找到trace file,发现末尾部分为

 

select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0),2))from objauth$ where obj#=:1 group by grantee#,privilege#,nvl(col#,0) order by grantee#

END OF STMT

PARSE #13:c=0,e=485,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=0,tim=1066522815818430

BINDS #13:

?bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0

?? bfp=40647620 bln=22 avl=02 flg=05

?? value=72

EXEC #13:c=0,e=518,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=1066522815819173

FETCH #13:c=0,e=58,p=0,cr=2,cu=0,mis=0,r=0,dep=2,og=4,tim=1066522815819296

=====================

PARSING IN CURSOR #12 len=34 dep=1 uid=0 oct=7 lid=0 tim=1066522815819480 hv=3008185193 ad='56fcbaac'

delete from idl_ub1$ where obj#=:1

END OF STMT

PARSE #12:c=10000,e=14804,p=6,cr=79,cu=0,mis=1,r=0,dep=1,og=0,tim=1066522815819472

BINDS #12:

?bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0

?? bfp=40645a48 bln=22 avl=06 flg=05

?? value=4294951150

WAIT #12: nam='db file sequential read' ela= 41 p1=1 p2=22839 p3=1

WAIT #12: nam='db file sequential read' ela= 27 p1=1 p2=9 p3=1

EXEC #12:c=0,e=884,p=2,cr=4,cu=2,mis=0,r=0,dep=1,og=4,tim=1066522815820884

ERROR #12:err=16000 tim=1837753940

ORA-00604: error occurred at recursive SQL level 1

ORA-16000: database open for read-only access

EXEC #1:c=370000,e=375493,p=304,cr=5505,cu=4,mis=0,r=0,dep=0,og=4,tim=1066522815826303

ERROR #1:err=1092 tim=1837753940

 

很奇怪,正常standby数据库open read  only的时候是没有delete的,在这里居然出现dml。于是一开始我尝试去研究idl_ub1$是个什么表,结果发现是存储 pl/sql代码的,很奇怪,这个数据库为什么会出现个状况,继续在 $ORACLE_HOME/rdbms/admin 下 grep -ir idl_ub1$ * ,结果找到和一个x$ 表挂上关系做连接了,未果,又通过 delete from idl_ub1$ where obj#=:1 中绑定变量value=4294951150 去结合搜索出来的view查找,也没什么进展。实际上,根据 WAIT #12: nam='db file sequential read' ela= 27 p1=1 p2=9 p3=1 可以知道这个delete是访问了file# =1 block#=9 的数据块,很显然,这是系统回滚段的块头,根据这里我们可以知道,这是一个回滚操作,也就是说,在主数据库中由于DDL 的失败,使得系统回滚段中记录了该操作的回退信息,但是数据库又还没有回滚完成,到了standby中就出现这个状况了。

 

为了尝试解决这个问题,首先继续应用几个日志,再open read only,结果失败,也就是说该失败的ddl在主数据库一直没有回滚,而数据库中已经查找不到任何异常进程或者事务、锁,只能认为是系统出现异常。于是,今天在征得相关负责人的同意的情况下,我们决定在下班后将主数据库重新启动一次看看,这样强迫系统回滚失败的ddl。下班重新启动数据库,切换日志并归档,在standby上应用日志到重起数据库之后产生的归档日志,再open  read only 成功,问题得到解决。

相关内容
赞助商链接