blog menu1

Troubleshooting OEM



Troubleshooting OEM



Troubleshooting Oracle Enterprise Manager 10g




Symptoms 
The target database instance and listener are up. The management agent for the target database is up and running. After several minutes, Grid Control Console shows the following error for the target database: 
The instance has been started in no-mount state. 
It also shows: 
Database Instance: Status Open 
Listener: Status Up 



"The database target is currently unavailable. The state of the components are listed below." 
Agent Connection to Instance: Status Unavailable Details 
Checked the status of the dbsnmp database user and it was NOT locked. 




Solution 
Log in to the node that hosts the target database. Switch your SHELL environment to the management agent and run "emctl clearstate agent": 

[oracle@vmlinux1 ~]$ emctl clearstate agent 
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. 
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. 
EMD clearstate completed successfully 

[oracle@vmlinux1 ~]$ emctl upload agent 
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. 
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. 
--------------------------------------------------------------- 
EMD upload completed successfully 



EMAgent is Thrashing. Exiting watchdog 
Symptoms 
Oracle Grid Control has been working for several months with no issue (GC 10.2.0.5). No changes and/or patches have been applied to the environment. On a recent reboot of the OEM Grid Control management server, the Oracle agent is throwing the following message when trying to start: 

[oracle@oemprod ~]$ . oraenv 
ORACLE_SID = [oms10g] ? agent10g 
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/agent10g is /u01/app/oracle 

[oracle@oemprod ~]$ emctl start agent 
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. 
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. 
Starting agent ................................. failed. 
EMAgent is Thrashing. Exiting watchdog 
Consult the log files in: /u01/app/oracle/product/agent10g/sysman/log 

Solution 
Reviewing the emagent.trc and emdctl.trc files did not provide any useful trace infomation to diagnose the problem in this case. I did find several notes on Metalink regarding this error, however, none of the solutions applied to my configuration. 
I then decided to check the dbsnmp login credentials to the OEM Grid Control Repository. The database password in this case was valid, however the password was about to expire: 

[oracle@oemprod log]$ sqlplus dbsnmp/<dbsnmp_password> 

SQL*Plus: Release 11.1.0.7.0 - Production on Mon May 10 11:46:58 2010 

Copyright (c) 1982, 2008, Oracle. All rights reserved. 

ERROR: 
ORA-28002: the password will expire within 7 days 


Connected to: 
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production 
With the Partitioning, Oracle Label Security, OLAP, Data Mining, 
Oracle Database Vault and Real Application Testing options 

SQL> 

The solution in this case was to change the password for the dbsnmp user account. Note that I used the same password value when changing the dbsnmp password to avoid any further complications: 

SQL> alter user dbsnmp identified by <same_dbsnmp_password>; 

User altered. 

After updating the dbsnmp password, the agent was able to start successfully: 

[oracle@oemprod ~]$ . oraenv 
ORACLE_SID = [emrep] ? agent10g 
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/agent10g is /u01/app/oracle 

[oracle@oemprod ~]$ emctl start agent 
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. 
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. 
Starting agent .......... started. 



EMD upload error: uploadXMLFiles skipped :: OMS version not checked yet.. 
Symptoms 
After upgrading or recoverying the OMS and Agent to version 10.2.0.5, it is possible to recieve the error in <AGENT_HOME>/sysman/log/emagent.trc

2010-05-10 12:28:56,007 Thread-149101456 ERROR pingManager: Did not receive valid 
response to ping "ERROR-Agent is blocked. Blocked reason is: Agent is out-of-sync 
with repository. This most likely means that the agent was reinstalled or recovered. 
Please contact an EM administrator to unblock the agent by performing an agent 
resync from the console. Please contact EM adminstrator to unblock the agent" 

'emctl upload agent' fails with: 

[oracle@oemprod ~]$ emctl upload agent 
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. 
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. 
--------------------------------------------------------------- 
EMD upload error: uploadXMLFiles skipped :: OMS version not checked yet.. 

Cause 
The Agent Re-synchronization is a new feature in 10.2.0.5 Grid Control which verifies whether an Agent that been uploading earlier to the OMS is re-installed or restored from a backup. If yes, the OMS blocks further updates from this Agent until the information about the Agent and all its targets are synchronized between the Repository and the Agent. 
Factors which can cause the Agent to go out-of-sync with the Repository: 

  • Agent is re-installed or restored from a backup, on the same port number as before.
  • The <AGENT_HOME>/sysman/emd/agntstmp.txt is manually deleted for some reason.

Solution 
To solve the issue you have to perform following from the OEM grid control console: 

  1. Login to 10.2.0.5 Grid Console as sysman user
  2. Navigate to Setup > Agents > [Click on problematic Agent name].
  3. In Agent home page click on "Resycnhronize" Button on right hand top of the page. Choose the 'Unblock Agent' option and click [Continue].


Ensure that the Agent is up and running when attempting the Re-synchronization. 

Once the re-sycnchronization is completed successfully, the Agent should be able to communicate with the OMS: 

[oracle@oemprod ~]$ emctl upload agent 
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. 
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. 
--------------------------------------------------------------- 
EMD upload completed successfully 

Re-synchronization Failure 
It is possible for the re-synchornization operation to complete with the following error: 

Agent Operation completed with errors. For those targets that could not be saved, please go to the target's monitoring configuration page to save them. All other targets have been saved successfully. Agent has not been unblocked. 
  1. Error saving target database1.domain:oracle_database - Skipping target {database1.domain, oracle_database}: Missing properties - UserName, password
  2. Error saving target database2.domain:oracle_database - Skipping target {database2.domain, oracle_database}: Missing properties - UserName, password
  3. ...
OR 
Error saving target EnterpriseManager0.omsmachine.domain_Web 
Cache:oracle_webcache - Skipping target 
{EnterpriseManager0.omsmachine.domain_Web Cache, oracle_webcache}: 
Missing properties - authpwd, authuser 

The above indicates that the Repository has been unable to completely save the details of the all targets that the Agent currently has. In this case, the Agent was re-installed and has newly discovered all the Database targets and iAS targets, which are not completely configured as yet (i.e the monitoring password is missing). 
To resolve the above: 

  1. Navigate to the Agent home page in the Grid Console (Setup > Agents). Choose one database at a time and click on the [Configure] button.
  2. In the "Monitoring Configuration" page, enter the password for the monitoring user (the dbsnmp user) and save.
  3. Perform the same for any other target that is shown in the result from the Agent re-synchronization.
  4. Once, all the targets are configured in this manner go to: Setup > Agents > [choose this Agent name] and click on the [Unblock] button.




If your Enterprise Manager grid environment is making use of firewalls, ensure you specify the appropriate ports. 
The following figure provides a topology of an Enterprise Manager grid environment that is using a firewall, and also illustrates the appropriate ports that you must specify: 



The conventions used in the preceding illustration are as follows: 
||||
Conventions

||
ConventionDescription
Is the entity that is making the call. 
Enterprise Manager will default to the first available port within an Enterprise Manager set range. 

Enterprise Manager will default to the first available port. 
Are the Database listener ports. 



Note: 

  • The direction of the arrows specify the direction of ports.
  • Port 1159, 4898-4989 specify that 1159 is the default. If this port is not available, the management Service will search in the range that is specified.
  • To clone between two target hosts separated by a firewall, the agents will need to communicate to each other on the agent ports. The initiating agent will make the call.

Port Description 
These are the ports that need to open for bi-directional data communication between the OEM Grid Control management server and the target agent: 
||||||
OEM Management Server / Agent

||
Port(s)RequiredDescription
1521 external image C:%5CDOCUME~1%5Ckacholed%5CLOCALS~1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_image001.gifTNS Listener 
3872 external image C:%5CDOCUME~1%5Ckacholed%5CLOCALS~1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_image001.gifOMS / Agent Data Communications 
1830-1849 
1159 external image C:%5CDOCUME~1%5Ckacholed%5CLOCALS~1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_image001.gifEnterprise Manager Central Console Secure Port 
4898-4989 
Console and OMS / Agent Data Communications 
4889 external image C:%5CDOCUME~1%5Ckacholed%5CLOCALS~1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_image001.gifOMS / Agent Data Communications 
4890-4897 
7200-7210 
Oracle HTTP Server Diagnostic port 



The ports that should be open on a VPN tunnel to the OEM management console: 
||||||
OEM Management Server / Agent

||
Port(s)RequiredDescription
4889 external image C:%5CDOCUME~1%5Ckacholed%5CLOCALS~1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_image001.gifEnterprise Manager Central Console Port 
4444 
Oracle HTTP Server Listen (SSL) port 
1156-1158
Application Server Control port, Oracle Management
Agent Port, Enterprise Manager Central Console Secure Port
1159 external image C:%5CDOCUME~1%5Ckacholed%5CLOCALS~1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_image001.gif
7200 
Oracle HTTP Server Diagnostic port 
7777 
Oracle HTTP Server port 
7778 
Oracle HTTP Server Listen port 
8250 
Web Cache HTTP Listen (SSL) port 
9400-9410
Oracle Web Cache Administration, Console, and Diagnostics



If you're like me, you may opt to keep the CRS Cluster Name set to its default value of "crs". Even when configuring a second or third Oracle RAC within an organization, we many times simply leave the cluster name for the new clusters set to their default value of crs. Although I believe it is bad practice to configure multiple Oracle RAC clusters with the same CRS cluster name, it doesn't necessarily cause any conflicts given these clusters don't interact with each other. The clustered databases work independently from each other without incident. That is, until you register multiple clustered databases with the same CRS cluster name in Oracle Enterprise Manager Grid Control!
An issue can arise when multiple Oracle RAC clusters with the same CRS cluster name are registered as targets in EM Grid Control. While not causing a problem with the cluster itself, it does cause EM Grid Control to think both clusters are the same.

No comments:

Post a Comment