Troubleshooting OEM
Troubleshooting Oracle Enterprise Manager 10g
Symptoms
The target database instance and listener are up. The management agent for the target database is up and running. After several minutes, Grid Control Console shows the following error for the target database:
The instance has been started in no-mount state.
It also shows:
Database Instance: Status Open
Listener: Status Up
"The database target is currently unavailable. The state of the components are listed below."
Agent Connection to Instance: Status Unavailable Details
Checked the status of the dbsnmp database user and it was NOT locked.
Solution
Log in to the node that hosts the target database. Switch your SHELL environment to the management agent and run "emctl clearstate agent":
[oracle@vmlinux1 ~]$ emctl clearstate agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
EMD clearstate completed successfully
[oracle@vmlinux1 ~]$ emctl upload agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload completed successfully
EMAgent is Thrashing. Exiting watchdog
Symptoms
Oracle Grid Control has been working for several months with no issue (GC 10.2.0.5). No changes and/or patches have been applied to the environment. On a recent reboot of the OEM Grid Control management server, the Oracle agent is throwing the following message when trying to start:
[oracle@oemprod ~]$ . oraenv
ORACLE_SID = [oms10g] ? agent10g
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/agent10g is /u01/app/oracle
[oracle@oemprod ~]$ emctl start agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
Starting agent ................................. failed.
EMAgent is Thrashing. Exiting watchdog
Consult the log files in: /u01/app/oracle/product/agent10g/sysman/log
Solution
Reviewing the emagent.trc and emdctl.trc files did not provide any useful trace infomation to diagnose the problem in this case. I did find several notes on Metalink regarding this error, however, none of the solutions applied to my configuration.
I then decided to check the dbsnmp login credentials to the OEM Grid Control Repository. The database password in this case was valid, however the password was about to expire:
[oracle@oemprod log]$ sqlplus dbsnmp/<dbsnmp_password>
SQL*Plus: Release 11.1.0.7.0 - Production on Mon May 10 11:46:58 2010
Copyright (c) 1982, 2008, Oracle. All rights reserved.
ERROR:
ORA-28002: the password will expire within 7 days
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
SQL>
The solution in this case was to change the password for the dbsnmp user account. Note that I used the same password value when changing the dbsnmp password to avoid any further complications:
SQL> alter user dbsnmp identified by <same_dbsnmp_password>;
User altered.
After updating the dbsnmp password, the agent was able to start successfully:
[oracle@oemprod ~]$ . oraenv
ORACLE_SID = [emrep] ? agent10g
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/agent10g is /u01/app/oracle
[oracle@oemprod ~]$ emctl start agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
Starting agent .......... started.
EMD upload error: uploadXMLFiles skipped :: OMS version not checked yet..
Symptoms
After upgrading or recoverying the OMS and Agent to version 10.2.0.5, it is possible to recieve the error in <AGENT_HOME>/sysman/log/emagent.trc:
2010-05-10 12:28:56,007 Thread-149101456 ERROR pingManager: Did not receive valid
response to ping "ERROR-Agent is blocked. Blocked reason is: Agent is out-of-sync
with repository. This most likely means that the agent was reinstalled or recovered.
Please contact an EM administrator to unblock the agent by performing an agent
resync from the console. Please contact EM adminstrator to unblock the agent"
'emctl upload agent' fails with:
[oracle@oemprod ~]$ emctl upload agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload error: uploadXMLFiles skipped :: OMS version not checked yet..
Cause
The Agent Re-synchronization is a new feature in 10.2.0.5 Grid Control which verifies whether an Agent that been uploading earlier to the OMS is re-installed or restored from a backup. If yes, the OMS blocks further updates from this Agent until the information about the Agent and all its targets are synchronized between the Repository and the Agent.
Factors which can cause the Agent to go out-of-sync with the Repository:
- Agent is re-installed or restored from a backup, on the same port number as before.
- The <AGENT_HOME>/sysman/emd/agntstmp.txt is manually deleted for some reason.
Solution
To solve the issue you have to perform following from the OEM grid control console:
- Login to 10.2.0.5 Grid Console as sysman user
- Navigate to Setup > Agents > [Click on problematic Agent name].
- In Agent home page click on "Resycnhronize" Button on right hand top of the page. Choose the 'Unblock Agent' option and click [Continue].
Ensure that the Agent is up and running when attempting the Re-synchronization.
Once the re-sycnchronization is completed successfully, the Agent should be able to communicate with the OMS:
[oracle@oemprod ~]$ emctl upload agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload completed successfully
Re-synchronization Failure
It is possible for the re-synchornization operation to complete with the following error:
Agent Operation completed with errors. For those targets that could not be saved, please go to the target's monitoring configuration page to save them. All other targets have been saved successfully. Agent has not been unblocked.
- Error saving target database1.domain:oracle_database - Skipping target {database1.domain, oracle_database}: Missing properties - UserName, password
- Error saving target database2.domain:oracle_database - Skipping target {database2.domain, oracle_database}: Missing properties - UserName, password
- ...
OR
Error saving target EnterpriseManager0.omsmachine.domain_Web
Cache:oracle_webcache - Skipping target
{EnterpriseManager0.omsmachine.domain_Web Cache, oracle_webcache}:
Missing properties - authpwd, authuser
The above indicates that the Repository has been unable to completely save the details of the all targets that the Agent currently has. In this case, the Agent was re-installed and has newly discovered all the Database targets and iAS targets, which are not completely configured as yet (i.e the monitoring password is missing).
To resolve the above:
- Navigate to the Agent home page in the Grid Console (Setup > Agents). Choose one database at a time and click on the [Configure] button.
- In the "Monitoring Configuration" page, enter the password for the monitoring user (the dbsnmp user) and save.
- Perform the same for any other target that is shown in the result from the Agent re-synchronization.
- Once, all the targets are configured in this manner go to: Setup > Agents > [choose this Agent name] and click on the [Unblock] button.
If your Enterprise Manager grid environment is making use of firewalls, ensure you specify the appropriate ports.
The following figure provides a topology of an Enterprise Manager grid environment that is using a firewall, and also illustrates the appropriate ports that you must specify:
The conventions used in the preceding illustration are as follows:
||||
Conventions
||
Convention Description
C Is the entity that is making the call.
* Enterprise Manager will default to the first available port within an Enterprise Manager set range.
Enterprise Manager will default to the first available port.
* Are the Database listener ports.
Note:
- The direction of the arrows specify the direction of ports.
- Port 1159, 4898-4989 specify that 1159 is the default. If this port is not available, the management Service will search in the range that is specified.
- To clone between two target hosts separated by a firewall, the agents will need to communicate to each other on the agent ports. The initiating agent will make the call.
Port Description
||||||
OEM Management Server / Agent
||
Port(s) Required Description
1521 TNS Listener
3872 OMS / Agent Data Communications
1830-1849
1159 Enterprise Manager Central Console Secure Port
4898-4989
Console and OMS / Agent Data Communications
4889 OMS / Agent Data Communications
4890-4897
7200-7210
Oracle HTTP Server Diagnostic port
The ports that should be open on a VPN tunnel to the OEM management console:
||||||
OEM Management Server / Agent
||
Port(s) Required Description
4889 Enterprise Manager Central Console Port
4444
Oracle HTTP Server Listen (SSL) port
1156-1158
Application Server Control port, Oracle Management
Agent Port, Enterprise Manager Central Console Secure Port
1159
7200
Oracle HTTP Server Diagnostic port
7777
Oracle HTTP Server port
7778
Oracle HTTP Server Listen port
8250
Web Cache HTTP Listen (SSL) port
9400-9410
Oracle Web Cache Administration, Console, and Diagnostics
If you're like me, you may opt to keep the CRS Cluster Name set to its default value of "crs". Even when configuring a second or third Oracle RAC within an organization, we many times simply leave the cluster name for the new clusters set to their default value of crs. Although I believe it is bad practice to configure multiple Oracle RAC clusters with the same CRS cluster name, it doesn't necessarily cause any conflicts given these clusters don't interact with each other. The clustered databases work independently from each other without incident. That is, until you register multiple clustered databases with the same CRS cluster name in Oracle Enterprise Manager Grid Control!
An issue can arise when multiple Oracle RAC clusters with the same CRS cluster name are registered as targets in EM Grid Control. While not causing a problem with the cluster itself, it does cause EM Grid Control to think both clusters are the same.
Troubleshooting Oracle Enterprise Manager 10g
Symptoms
The target database instance and listener are up. The management agent for the target database is up and running. After several minutes, Grid Control Console shows the following error for the target database:
The instance has been started in no-mount state.
It also shows:
Database Instance: Status Open
Listener: Status Up
"The database target is currently unavailable. The state of the components are listed below."
Agent Connection to Instance: Status Unavailable Details
Checked the status of the dbsnmp database user and it was NOT locked.
Solution
Log in to the node that hosts the target database. Switch your SHELL environment to the management agent and run "emctl clearstate agent":
[oracle@vmlinux1 ~]$ emctl clearstate agent Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. EMD clearstate completed successfully [oracle@vmlinux1 ~]$ emctl upload agent Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. --------------------------------------------------------------- EMD upload completed successfully |
EMAgent is Thrashing. Exiting watchdog
Symptoms
Oracle Grid Control has been working for several months with no issue (GC 10.2.0.5). No changes and/or patches have been applied to the environment. On a recent reboot of the OEM Grid Control management server, the Oracle agent is throwing the following message when trying to start:
[oracle@oemprod ~]$ . oraenv ORACLE_SID = [oms10g] ? agent10g The Oracle base for ORACLE_HOME=/u01/app/oracle/product/agent10g is /u01/app/oracle [oracle@oemprod ~]$ emctl start agent Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. Starting agent ................................. failed. EMAgent is Thrashing. Exiting watchdog Consult the log files in: /u01/app/oracle/product/agent10g/sysman/log |
Solution
Reviewing the emagent.trc and emdctl.trc files did not provide any useful trace infomation to diagnose the problem in this case. I did find several notes on Metalink regarding this error, however, none of the solutions applied to my configuration.
I then decided to check the dbsnmp login credentials to the OEM Grid Control Repository. The database password in this case was valid, however the password was about to expire:
[oracle@oemprod log]$ sqlplus dbsnmp/<dbsnmp_password> SQL*Plus: Release 11.1.0.7.0 - Production on Mon May 10 11:46:58 2010 Copyright (c) 1982, 2008, Oracle. All rights reserved. ERROR: ORA-28002: the password will expire within 7 days Connected to: Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production With the Partitioning, Oracle Label Security, OLAP, Data Mining, Oracle Database Vault and Real Application Testing options SQL> |
The solution in this case was to change the password for the dbsnmp user account. Note that I used the same password value when changing the dbsnmp password to avoid any further complications:
SQL> alter user dbsnmp identified by <same_dbsnmp_password>; User altered. |
After updating the dbsnmp password, the agent was able to start successfully:
[oracle@oemprod ~]$ . oraenv ORACLE_SID = [emrep] ? agent10g The Oracle base for ORACLE_HOME=/u01/app/oracle/product/agent10g is /u01/app/oracle [oracle@oemprod ~]$ emctl start agent Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. Starting agent .......... started. |
EMD upload error: uploadXMLFiles skipped :: OMS version not checked yet..
Symptoms
After upgrading or recoverying the OMS and Agent to version 10.2.0.5, it is possible to recieve the error in <AGENT_HOME>/sysman/log/emagent.trc:
2010-05-10 12:28:56,007 Thread-149101456 ERROR pingManager: Did not receive valid response to ping "ERROR-Agent is blocked. Blocked reason is: Agent is out-of-sync with repository. This most likely means that the agent was reinstalled or recovered. Please contact an EM administrator to unblock the agent by performing an agent resync from the console. Please contact EM adminstrator to unblock the agent" |
'emctl upload agent' fails with:
[oracle@oemprod ~]$ emctl upload agent Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. --------------------------------------------------------------- EMD upload error: uploadXMLFiles skipped :: OMS version not checked yet.. |
Cause
The Agent Re-synchronization is a new feature in 10.2.0.5 Grid Control which verifies whether an Agent that been uploading earlier to the OMS is re-installed or restored from a backup. If yes, the OMS blocks further updates from this Agent until the information about the Agent and all its targets are synchronized between the Repository and the Agent.
Factors which can cause the Agent to go out-of-sync with the Repository:
- Agent is re-installed or restored from a backup, on the same port number as before.
- The <AGENT_HOME>/sysman/emd/agntstmp.txt is manually deleted for some reason.
Solution
To solve the issue you have to perform following from the OEM grid control console:
- Login to 10.2.0.5 Grid Console as sysman user
- Navigate to Setup > Agents > [Click on problematic Agent name].
- In Agent home page click on "Resycnhronize" Button on right hand top of the page. Choose the 'Unblock Agent' option and click [Continue].
Ensure that the Agent is up and running when attempting the Re-synchronization. |
Once the re-sycnchronization is completed successfully, the Agent should be able to communicate with the OMS:
[oracle@oemprod ~]$ emctl upload agent Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. --------------------------------------------------------------- EMD upload completed successfully |
Re-synchronization Failure
It is possible for the re-synchornization operation to complete with the following error:
Agent Operation completed with errors. For those targets that could not be saved, please go to the target's monitoring configuration page to save them. All other targets have been saved successfully. Agent has not been unblocked.
Error saving target EnterpriseManager0.omsmachine.domain_Web Cache:oracle_webcache - Skipping target {EnterpriseManager0.omsmachine.domain_Web Cache, oracle_webcache}: Missing properties - authpwd, authuser |
The above indicates that the Repository has been unable to completely save the details of the all targets that the Agent currently has. In this case, the Agent was re-installed and has newly discovered all the Database targets and iAS targets, which are not completely configured as yet (i.e the monitoring password is missing).
To resolve the above:
- Navigate to the Agent home page in the Grid Console (Setup > Agents). Choose one database at a time and click on the [Configure] button.
- In the "Monitoring Configuration" page, enter the password for the monitoring user (the dbsnmp user) and save.
- Perform the same for any other target that is shown in the result from the Agent re-synchronization.
- Once, all the targets are configured in this manner go to: Setup > Agents > [choose this Agent name] and click on the [Unblock] button.
If your Enterprise Manager grid environment is making use of firewalls, ensure you specify the appropriate ports.
The following figure provides a topology of an Enterprise Manager grid environment that is using a firewall, and also illustrates the appropriate ports that you must specify:
The conventions used in the preceding illustration are as follows:
||||
Conventions
||
Convention | Description |
C | Is the entity that is making the call. |
* | Enterprise Manager will default to the first available port within an Enterprise Manager set range. |
Enterprise Manager will default to the first available port. | |
* | Are the Database listener ports. |
Note:
- The direction of the arrows specify the direction of ports.
- Port 1159, 4898-4989 specify that 1159 is the default. If this port is not available, the management Service will search in the range that is specified.
- To clone between two target hosts separated by a firewall, the agents will need to communicate to each other on the agent ports. The initiating agent will make the call.
Port Description
||||||
OEM Management Server / Agent
||
Port(s) | Required | Description |
1521 | TNS Listener | |
3872 | OMS / Agent Data Communications | |
1830-1849 | ||
1159 | Enterprise Manager Central Console Secure Port | |
4898-4989 | Console and OMS / Agent Data Communications | |
4889 | OMS / Agent Data Communications | |
4890-4897 | ||
7200-7210 | Oracle HTTP Server Diagnostic port |
The ports that should be open on a VPN tunnel to the OEM management console:
||||||
OEM Management Server / Agent
||
Port(s) | Required | Description |
4889 | Enterprise Manager Central Console Port | |
4444 | Oracle HTTP Server Listen (SSL) port | |
1156-1158 | Application Server Control port, Oracle Management Agent Port, Enterprise Manager Central Console Secure Port | |
1159 | ||
7200 | Oracle HTTP Server Diagnostic port | |
7777 | Oracle HTTP Server port | |
7778 | Oracle HTTP Server Listen port | |
8250 | Web Cache HTTP Listen (SSL) port | |
9400-9410 | Oracle Web Cache Administration, Console, and Diagnostics |
If you're like me, you may opt to keep the CRS Cluster Name set to its default value of "crs". Even when configuring a second or third Oracle RAC within an organization, we many times simply leave the cluster name for the new clusters set to their default value of crs. Although I believe it is bad practice to configure multiple Oracle RAC clusters with the same CRS cluster name, it doesn't necessarily cause any conflicts given these clusters don't interact with each other. The clustered databases work independently from each other without incident. That is, until you register multiple clustered databases with the same CRS cluster name in Oracle Enterprise Manager Grid Control!
An issue can arise when multiple Oracle RAC clusters with the same CRS cluster name are registered as targets in EM Grid Control. While not causing a problem with the cluster itself, it does cause EM Grid Control to think both clusters are the same.
No comments:
Post a Comment