In Cluster Services for OES 11 and earlier, after losing contact with the master node, a slave node attempts to find a node with a higher IP address that will act as master. The logic flow for the master-election algorithm is represented in Figure C-1, Master-Election Algorithm for OES 11 Initial Release and Earlier Versions.
Before polling other nodes, a node checks its own network adapter status. If the adapter is down, the node immediately promotes itself as the master. Rather than polling and waiting for replies that cannot come back, the node can jump directly to the ultimate conclusion for that node that it is the only node remaining in the cluster. This shortcut can substantially shorten the election process if the network outage is caused by a network adapter failure. However, if the cause for a network outage occurs further upstream by a cable or switch failure, the node follows the election process, unaware that it is awaiting a response that cannot arrive.
If a node’s adapter is working, it begins a process to find a live node with a higher IP address than its own:
A node determines if there are other nodes (excluding the old master) with an IP address higher than its own. If it has the highest IP address, it promotes itself as the master.
If there are nodes with higher IP addresses, a node asks a member node with the highest IP address (excluding the old master) to be the new master. It waits for a predetermined tolerance to see if that master candidate node will start acting like a master and send it heartbeat packages. If a heartbeat package arrives, the node becomes a slave of its elected master.
If no heartbeat package arrives, the node then picks the member node with the second highest IP address and repeats the same procedure.
If there are no responses from nodes with a higher IP address than its own, a node promotes itself as the master.
If more than one node is elected as master at the end of the process, the SBD guarantees that only one master survives and fences the other masters and their members. Describing how the SBD addresses all possible situations to determine the master is beyond the scope of this document. Let’s consider a simple scenario with multiple master candidates. The SBD kills any master candidate node whose adapter is down. If a master candidate was the old master, it becomes the new master. Otherwise, the master candidate with the highest IP address wins.
The master election process can take time because a node sequentially asks each potential master node to become the master and waits for a response before trying the next higher IP address. If a large cluster (8+ nodes) loses LAN connections among all the nodes, it can take up to 3 minutes to elect all of the new masters. The node with lowest IP address tries almost all of the other nodes.
Figure C-1 Master-Election Algorithm for OES 11 Initial Release and Earlier Versions