8.11 Setting Up Auto-Failover

Auto-failover is available for Business Continuity Clustering. To set up the auto-failover feature, you must enable it, and then configure the auto-failover settings.

WARNING:Auto-failover is disabled by default and is not recommended. It should only be enabled after a thorough examination and review of your network and geographic site infrastructure. You should seriously consider the adverse conditions that might occur as a result of enabling this feature.

These conditions might include but are not limited to the following:

Data loss at one or more geographic sites
Data corruption at one or more geographic sites
Data divergence at one or more geographic sites

For example, if there is a loss of communication between two clusters and auto-failover has been enabled and configured, each cluster will assert ownership of BCC-enabled cluster resources. These resources then automatically load on both clusters.

When communication between clusters has been restored, some of the data on each cluster is different. This is called data divergence. Also, the mirroring or synchronization process either fails or attempts to overwrite any changed data on one cluster. This causes either data loss or data corruption.

8.11.1 Enabling Auto-Failover

To enable auto-failover for all Business Continuity Cluster resources in a cluster:

Log in to iManager as the BCC Administrator user.
In Roles and Tasks, click Clusters > My Clusters.
Select the check box next to the cluster, then select Actions > Properties.

You can also click the Properties button on the Cluster Options page for the cluster.
In the Cluster Properties dialog box, click the Business Continuity tab.
On the Business Continuity page, click the Main link.
Click the AutoFailover button.
Click the Auto-Failover link just under the tabs.
Select the Enable Automatic Failover of Business Continuity Cluster Resources check box, then click Apply.
Continue with Section 8.11.2, Creating an Auto-Failover Policy to create a failover policy.

Auto-failover is not completely enabled until you create an auto-failover policy.

8.11.2 Creating an Auto-Failover Policy

By default, no auto-failover policy exists for BCC. You must create an auto-failover policy for each cluster in your BCC where you want auto-failover enabled. This is required to automatically fail over resources from one cluster to another.

In iManager, under Cluster Membership Monitoring Settings, select a cluster and click the Edit link.
Under Membership Threshold, select the Enable check box, select either Percent Fail or Nodes Fail, and specify either the percentage of failed nodes or the number of failed nodes.

The node failure number or percentage you specify must be met for the selected cluster before resources automatically fail over to another cluster.

IMPORTANT:Do not use a membership condition of total node failure (either 100 percent or the total number of nodes); the condition cannot be satisfied because the cluster will not be up to report this state.

If a cluster has been totally downed and an auto-fail-over has occured, you must bring up only one node in the downed cluster, and then run the cluster resetresources command on that node before the other nodes may join the cluster and you may begin manually migrating the BCC-enabled resources back to the cluster that has been down.
Under Communication Timeout, select the Enable check box and specify the number of minutes that must elapse without any communication between clusters before resources automatically fail over to another cluster.
Click OK to finish editing the policy.
Click Apply to save your settings.

8.11.3 Refining the Auto-Failover Policy

You can further refine auto-failover policies to give you more control over if or when an auto-failover occurs. To do this, click the Advanced button to display additional fields for specifying auto-failover criteria and adding monitoring information.

The policy for automatic failover is configured by creating rules. Each row in the Failover Policy Configuration table represents a rule that applies to a single cluster, or to all clusters in the business continuity cluster. Each rule contains a set of conditions. Each condition tests one of the following criteria:

The value of an indication reported by a monitor
The amount of time the connection to a cluster has been down
If the connection to a cluster is up

These conditions can be combined in any order to construct a more robust rule that helps to avoid an undesired failover. For failover to occur, each condition of only one rule must be satisfied for the specified cluster or clusters.

For rules with monitor conditions that are automatically created by using the Cluster Membership Monitoring Settings table, you can add a condition that tests whether the connection to the peer cluster is up. Adding this condition changes the behavior of the rule. With this rule, a graceful automatic failover of resources can happen when the connection to the peer cluster is up.

You can also specify or change the criteria for percent or number of nodes that are used to determine if an automatic failover can occur.

IMPORTANT:Do not use a membership condition of total node failure (either 100 percent or the total number of nodes); the condition cannot be satisfied because the cluster will not be up to report this state.

If a cluster has been totally downed and an auto-failover has occured, you must bring up only one node in the downed cluster, then run the cluster resetresources command on that node before the other nodes may join the cluster and you may begin manually migrating the BCC-enabled resources back to the cluster that has been down.

You should create a separate rule with a connection down condition. Adding a connection down condition to an existing rule with a condition that tests cluster membership is not recommended. It is highly unlikely that cluster membership information for a specific cluster will be reported to peer clusters when the connection to that specific cluster is down.

For example, a rule might contain only one condition that tests whether a connection to a specific cluster has been down for five or more minutes. Failover occurs when peer clusters agree that the connection to the cluster specified in the rule has been down for five or more minutes. If the peer clusters do not agree about the connection being down (that is, one cluster has a valid connection to the specified cluster), failover does not occur. More complex rules can be constructed that contain multiple conditions.

If previously configured, the fields under Failover Policy Configuration should already contain information on the policies that were created in the Cluster Membership Monitoring Settings section of the page.

Under Failover Policy Configuration, select a policy and click Edit to further refine a rule. Click Delete to remove the rule, or click New to create a new rule that you can add the additional failover conditions to.
Select the cluster that you want the rule to apply to, or select All to apply the policy to all clusters.
Under Conditions, choose the type of condition and the appropriate values. To add multiple conditions to the rule, click the Add button below the condition.

You can use the default setting of Monitor if you don’t want to apply the cluster up or cluster down criteria to this policy. You can also specify or change the percent or number of nodes criteria that are used to determine whether an auto failover can occur.
Click Apply to save your settings.

8.11.4 Adding or Editing Monitor Configurations

Clicking the Advanced button also displays an additional section on this page called Health Monitor Configuration. Monitors are an important part of the automatic failover feature, and are separate processes that perform a specialized task to analyze the health of a specific cluster or all clusters in the BCC. These monitors report an indication of health to BCC. BCC, in turn, uses the reported information to analyze the failover policy to determine if resources should be migrated from a specific cluster. BCC ships with two monitors (nodecnt and node pnt) that report an indication of health that represents either the percentage or number of nodes that do not belong to a specific cluster.

If they are configured by using the Cluster Membership Monitoring Settings table, the fields under Health Monitor Configuration should already contain information for the health monitor (nodepnt or nodecnt) included with BCC. Although default values have already been set, you can customize some of the monitor settings for the cluster membership monitors. If you have created your own custom monitor, you can click New to add configuration settings to your monitor.

In iManager, under Monitor Name in the Health Monitor Configuration section, select a monitor and click Edit.
Under Clusters, select the cluster or clusters that you want this monitor to apply to.
Specify the maximum health indication that the monitor will report.

This value is used when creating a failover policy to validate the rules. This is the maximum value that can be used for the threshold type when you create a failover policy. For example, if you specified percent fail membership monitoring, the maximum values would be 100 (for 100 percent) for the nodepnt monitor. If you specified nodes fail membership monitoring, the maximum value for the nodecnt monitor is the maximum number of nodes permitted in a cluster, which is 32. If you created your own custom monitor, the values could be different.

For the nodepnt and nodecnt monitors, the Maximum Health Indication value is for information only, and should not be changed.
Under Short Polling Interval, specify the number of seconds the monitor will wait each time it contacts the cluster or clusters to get health information.

The Long Polling Interval is not used with the default nodepnt and nodecnt monitors. This value might be used for some custom monitors.
Specify Linux as the platform that you want to be monitored by the health monitor and whether you want the monitor enabled for the selected clusters.

The Optional Parameter field specifies a monitor-specific string value that is passed to the monitor as a startup parameter.

The nodepnt and nodecnt monitors do not support optional parameters.
Click Apply to save your settings.

NOTE:See the BCC NDK documentation for more information on creating custom failover policies.