Essential New Feature for Grid Control in a Cold Failover Cluster

No Comments

Ever since I inherited the Grid Control project at my current company, I've despaired of ever finding a way to monitor the multiple databases in our Cold Failover Cluster (CFC) environments. You see, as of version 10.2.0.3, the recommended method for monitoring databases in a CFC setup was to install one agent per node, and one agent per database. The node agents would be configured to only monitor the node itself, as well as any non-floating services tied to that node. The database agents would be configured to only monitor the database, and any floating services associated with that database, e.g. the listener.

So minimally, for a two node CFC cluster with a single database, you would need three agents installed. This doesn't seem too bad for the minimal case, but in our test systems, we may have as many as thirty databases and listeners in a single CFC cluster. There is just no way I am going to install and configure thirty-two agents for monitoring a single cluster. The manageability aspect of that situation alone seems far too… well… unmanageable.

Currently, we've been installing just the two node agents, and configuring them to monitor all the databases, regardless of which node they are currently located on. This means that we see duplicate database entries, and down listeners that we have to ignore. It's ugly at best.

However, a colleague of mine just pointed me to the documentation for the 10.2.0.5 update for Grid Control, and I was very pleased to see that this problem is being addressed:

Beginning with Oracle Enterprise Manager 10g release 10.2.0.5, a single Oracle Management Agent running on each node in the cluster can monitor targets configured for active / passive high availability. Only one Management Agent is required on each of the physical nodes of the CFC cluster because, in case of a failover to the passive node, Enterprise Manager can move the HA monitored targets from the Management Agent on the failed node to another Management Agent on the newly activated node using a series of EMCLI commands.

If your application is running in an active/passive environment, the clusterware brings up the applications on the passive node in the event that the active node fails. For Enterprise Manager to continue monitoring the targets in this type of configuration, the existing Management Agent needs additional configuration.

The following sections describe how to prepare the environment to automate and restart targets on the new active node. Failover and fallback procedures are also provided.

I'm looking forward to implementing 10.2.0.5 very soon, and I sure hope this works as documented!