Home » Server Options » RAC & Failsafe » RAC Instance getting stuck at LCK0
RAC Instance getting stuck at LCK0 [message #166626] Fri, 07 April 2006 05:17 Go to next message
asangapradeep
Messages: 128
Registered: October 2005
Location: UK
Senior Member
Hi,
I have a RAC with two nodes. Everything got installed and worked fined until recently I noticed that one node does not respond to srvctl commands and its status does not get reflected on the other node as well. For example even when shutdown the instance using sqlplus the srvctl status from the other instance would indicate both instances are running.
And srvctl command would get stuck when applied to the first node.
I couldn’t get an ocrdump from this node as well but the second node responds to this node without any problem. I suspect there some trouble reading the OCR.
I’m listing below the alter log output when I tried to start this particular instance manually.

Quote:


Starting up ORACLE RDBMS Version: 10.2.0.1.0.
System parameters with non-default values:
processes = 500
sessions = 555
__shared_pool_size = 369098752
__large_pool_size = 16777216
__java_pool_size = 16777216
__streams_pool_size = 0
spfile = +DATA/livedb/spfilelivedb.ora
sga_target = 1224736768
control_files = +DATA/livedb/controlfile/current.260.584103303,
+FLASH_RECOVERY/livedb/controlfile/current.256.584103303
db_block_size = 8192
__db_cache_size = 805306368
compatible = 10.2.0.1.0
log_archive_format = %t_%s_%r.dbf
db_file_multiblock_read_count= 8
cluster_database = TRUE
cluster_database_instances= 2
db_create_file_dest = +DATA
db_recovery_file_dest = +FLASH_RECOVERY
db_recovery_file_dest_size= 128765132800
thread = 2
instance_number = 2
undo_management = AUTO
undo_tablespace = UNDOTBS1
db_block_checking = TRUE
remote_login_passwordfile= EXCLUSIVE
db_domain = goway.com
dispatchers = (PROTOCOL=TCP) (SERVICE=livedbXDB)
remote_listener = LISTENERS_LIVEDB
job_queue_processes = 10
background_dump_dest = /opt/oracle/admin/livedb/bdump
user_dump_dest = /opt/oracle/admin/livedb/udump
core_dump_dest = /opt/oracle/admin/livedb/cdump
audit_file_dest = /opt/oracle/admin/livedb/adump
db_name = livedb
open_cursors = 300
pga_aggregate_target = 402653184
Cluster communication is configured to use the following interface(s) for this instance
167.125.130.52
Fri Apr 7 05:51:41 2006
cluster interconnect IPC version:Oracle UDP/IP
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=17706
DIAG started with pid=3, OS id=17708
PSP0 started with pid=4, OS id=17710
LMON started with pid=5, OS id=17712
LMD0 started with pid=6, OS id=17714
LMS0 started with pid=7, OS id=17716
LMS1 started with pid=8, OS id=17720
MMAN started with pid=9, OS id=17724
DBW0 started with pid=10, OS id=17746
LGWR started with pid=11, OS id=17753
CKPT started with pid=12, OS id=17755
SMON started with pid=13, OS id=17757
RECO started with pid=14, OS id=17759
CJQ0 started with pid=15, OS id=17761
MMON started with pid=16, OS id=17763
Fri Apr 7 05:51:42 2006
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=17, OS id=17765
Fri Apr 7 05:51:42 2006
starting up 1 shared server(s) ...
Fri Apr 7 05:51:42 2006
lmon registered with NM - instance id 2 (internal mem no 1)
Fri Apr 7 05:51:42 2006
Reconfiguration started (old inc 0, new inc 60)
pseudo shared rm latch used
List of nodes:
0 1
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid = 1 according to instance 0
Fri Apr 7 05:51:43 2006
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Fri Apr 7 05:51:43 2006
LMS 0: 0 GCS shadows cancelled, 0 closed
Fri Apr 7 05:51:43 2006
LMS 1: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Fri Apr 7 05:51:44 2006
LMS 0: 0 GCS shadows traversed, 0 replayed
Fri Apr 7 05:51:44 2006
LMS 1: 0 GCS shadows traversed, 0 replayed
Fri Apr 7 05:51:44 2006
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=20, OS id=17822



It seems it get stuck at “LCK0 started with pid=20, OS id=17822”
I don’t know how to interpret this. I assume it’s some kind of locking process. Previous occasion of the alert log shows that after this the disks are mounted (data files are on ASM). And database is opened. How do i resolve this? Any help regarding this problem is most welcome
Re: RAC Instance getting stuck at LCK0 [message #166628 is a reply to message #166626] Fri, 07 April 2006 05:37 Go to previous messageGo to next message
asangapradeep
Messages: 128
Registered: October 2005
Location: UK
Senior Member
To add some more to the above…...

When the database is at this stuck stage, when I checked the instances status it came out as “STARTED”.
Then I did an alter database mount and a alter database open to successfully mount and open the database. Alert log output is given below. If I can do it manually why is this not happening automatically with the RAC processes

Quote:



LCK0 started with pid=20, OS id=17822
Fri Apr 7 06:16:10 2006
alter database mount
Fri Apr 7 06:16:10 2006
Starting background process ASMB
ASMB started with pid=23, OS id=22407
Starting background process RBAL
RBAL started with pid=24, OS id=22411
Fri Apr 7 06:16:18 2006
SUCCESS: diskgroup DATA was mounted
SUCCESS: diskgroup FLASH_RECOVERY was mounted
Fri Apr 7 06:16:22 2006
Setting recovery target incarnation to 2
Fri Apr 7 06:16:22 2006
Successful mount of redo thread 2, with mount id 2541666801
Fri Apr 7 06:16:23 2006
Allocated 15937344 bytes in shared pool for flashback generation buffer
Starting background process RVWR
RVWR started with pid=26, OS id=22720
Fri Apr 7 06:16:23 2006
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Completed: alter database mount
Fri Apr 7 06:16:33 2006
alter database open
Picked broadcast on commit scheme to generate SCNs
Fri Apr 7 06:16:33 2006
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=27, OS id=22977
Fri Apr 7 06:16:33 2006
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=28, OS id=22979
Fri Apr 7 06:16:33 2006
Thread 2 opened at log sequence 165
Current log# 3 seq# 165 mem# 0: +DATA/livedb/onlinelog/group_3.265.584103433
Current log# 3 seq# 165 mem# 1: +FLASH_RECOVERY/livedb/onlinelog/group_3.259.584103433
Successful open of redo thread 2
Fri Apr 7 06:16:33 2006
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Apr 7 06:16:33 2006
ARC1: STARTING ARCH PROCESSES
Fri Apr 7 06:16:33 2006
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri Apr 7 06:16:33 2006
SMON: enabling cache recovery
Fri Apr 7 06:16:33 2006
ARC2: Archival started
ARC1: STARTING ARCH PROCESSES COMPLETE
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=29, OS id=22981
Fri Apr 7 06:16:33 2006
db_recovery_file_dest_size of 122800 MB is 1.87% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Fri Apr 7 06:16:34 2006
Successfully onlined Undo Tablespace 1.
Fri Apr 7 06:16:34 2006
SMON: enabling tx recovery
Fri Apr 7 06:16:34 2006
Database Characterset is AL32UTF8
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=30, OS id=23060
Fri Apr 7 06:16:39 2006
Completed: alter database open
Fri Apr 7 06:22:33 2006
Shutting down archive processes
Fri Apr 7 06:22:38 2006
ARCH shutting down
ARC2: Archival stopped


Re: RAC Instance getting stuck at LCK0 [message #166643 is a reply to message #166628] Fri, 07 April 2006 06:26 Go to previous messageGo to next message
Mahesh Rajendran
Messages: 10707
Registered: March 2002
Location: oracleDocoVille
Senior Member
Account Moderator
Is GSD started in both the nodes?
Re: RAC Instance getting stuck at LCK0 [message #166647 is a reply to message #166643] Fri, 07 April 2006 06:36 Go to previous messageGo to next message
asangapradeep
Messages: 128
Registered: October 2005
Location: UK
Senior Member
Hi Mahesh,

Yes GSD is up on both

Quote:


[oracle@tbxdb1 crsd]$ srvctl status nodeapps -n tbxdb1
VIP is running on node: tbxdb1
GSD is running on node: tbxdb1
Listener is running on node: tbxdb1
ONS daemon is running on node: tbxdb1
[oracle@tbxdb1 crsd]$ srvctl status nodeapps -n tbxdb2
VIP is running on node: tbxdb2
GSD is running on node: tbxdb2
Listener is running on node: tbxdb2
ONS daemon is running on node: tbxdb2



Re: RAC Instance getting stuck at LCK0 [message #166659 is a reply to message #166647] Fri, 07 April 2006 07:25 Go to previous message
Mahesh Rajendran
Messages: 10707
Registered: March 2002
Location: oracleDocoVille
Senior Member
Account Moderator
I have no idea. All ***seems*** to be good. No errors reported.
If this is a reproducible case, I will open a TAR.
Regards.
Previous Topic: constraint violation using rac
Next Topic: Server side TAF and load balancing
Goto Forum:
  


Current Time: Fri Mar 29 04:06:52 CDT 2024