Sunday, July 3, 2011

Oracle 10gR2 CRS diagwait to avoid OPROCD node evictions

If your CRS is from and up is a must to apply the following on each node on our cluster.

  1. Execute as root
    #crsctl stop crs
    #<CRS_HOME>/bin/oprocd stop
  2. Ensure that Clusterware stack is down on all nodes by executing
    #ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
    This should return no processes. If there are clusterware processes running and you proceed to the next step, you will corrupt your OCR. Do not continue until the clusterware processes are down on all the nodes of the cluster.
  3. From one node of the cluster, change the value of the "diagwait" parameter to 13 seconds by issuing the command as root:
    #crsctl set css diagwait 13 -force
  4. Check if diagwait is set successfully by executing. the following command. The command should return 13. If diagwait is not set, the following message will be returned "Configuration parameter diagwait is not defined"
    #crsctl get css diagwait
  5. Restart the Oracle Clusterware on all the nodes by executing:
    #crsctl start crs
  6. Validate that the node is running by executing:
    #crsctl check crs

The above procedure had set the diagwait parameter to 13. This parameter according to Oracle gives an additional time of 10 secs to flush the necessary CRS diagnostics on the disk during a node eviction. From my experience it resolves node evictions under heavy load (Specially in AIX) because changes the OPROCD default values that affect scheduling latencies. In most circumstances OPROCD default values can be overly very sensitive , especially under heavy loaded cluster and Oracle stated "To overcome these scheduling latencies, Oracle recommends that you set the Oracle Clusterware parameter DIAGWAIT to the value 13." on note id 567730.1

Personally i set this parameter in every Oracle Clusterware 10gR2 ( and up) installation.
It does not apply in 11g since the logic has change.


Oracle Metalink Notes: 559365.1 ,567730.1

No comments:

Post a Comment