In Cluster Services for OES 11 SP1 and later, after losing contact with the master node, a slave node attempts to find a node with a higher IP address that will act as master, but it also uses intelligence to determine which other nodes are capable of responding. The new algorithm elects the same master as the old algorithm, but the time to make the decision is deterministic and the process is more efficient. The logical flow for the master-election algorithm is represented in Figure C-2, Master-Election Algorithm for OES 11 SP1 and Later.
In the new master-election algorithm, after a slave determines that its adapter is working, it broadcasts a ping to all of the cluster nodes. The purpose of pinging is to identify whether the node can communicate with other nodes and which ones. This helps address a situation where LAN communications are down between some nodes, such as a cable or switch failure.
If there are nodes with higher IP addresses than its own, a node asks a member node with the highest IP address (excluding the old master) to be the new master. It waits for a predetermined tolerance to see if that master candidate node will start acting like a master and send it heartbeat packages. If a heartbeat package arrives, the node becomes a slave of its elected master.
Sending the first master request in the old manner serves two purposes:
It maintains some compatibility with older versions of Cluster Services.
It provides some action while waiting for the ping replies to come in.
If the member with the highest IP address starts acting as the new master (in time), the election is over for this node.
If no heartbeat packages arrive, the node then focuses on member nodes with higher IP addresses that have either pinged or replied to its ping. This helps the node by-pass waiting periods for requests to nodes that are unable to communicate. The node picks the member node with the highest IP address, and asks it to be the new master. It waits for a predetermined tolerance to see if that master candidate node will start acting like a master and send it heartbeat packages. The request to this node is almost certain to succeed since it is known to be able to communicate.
If no heartbeat packages arrive, the node picks the member node with the next highest IP address that has either pinged or replied to the ping, and the election goes on.
If there are no responses from nodes with a higher IP address than its own, a node promotes itself as the master.
If more than one node is elected as master at the end of the process, the SBD guarantees that only one master survives and fences the other masters and their members.
The benefits of the new master-election algorithm are:
The new algorithm elects the same master that would be determined with the old algorithm.
The time to elect a master node is deterministic, regardless of the size of the cluster or the nature of the problem.
The traditional master-election logic is mostly preserved.
The first master request goes out while waiting for nodes to respond to pings, or to time out.
The second master request goes to a node that is known to be able to communicate, and is almost certain to succeed.
Election time is substantially decreased in some cases.
Figure C-2 Master-Election Algorithm for OES 11 SP1 and Later