The failover system is designed to shift monitors from one execution server to another and, if there has been a failure, for
example a hardware damage, to deactivate a failed server. The system does not however shift or deactivate servers if the network
at the location is slow or experiencing problems. To determine if a detected failure is due to a specific execution server
or the server's local network, at least two execution servers must be run at each location within the same local area network.
Otherwise, if only one server runs on a network, network outages and server hardware outages cannot be distinguished and therefore
automatic server deactivation for failures cannot be enabled.
How quickly a failover system reacts to a failure is defined with the
Responsiveness timeout [s] setting of the execution server.
The failover phases are as follows:
- After 2/3 of the defined time, the administrator is warned through email that the execution server is unavailable.
- If the server is still inaccessible after the full timeout has expired, failover analysis is initiated.
- It is determined if the functioning servers can accept additional load. If they can handle additional load, monitors are shifted
to other servers that provide the required resources, for example client/server,
Silk Test support, and others. The failed server is then set to
Inactive mode and is no longer used by monitors. Completed failover is indicated by an email to the administrator stating that the
execution server is in the state of
Inaccessible.
- Once the previous step is complete, the system attempts to connect to the failed execution server every 30 seconds to add
it back to the location. If this procedure is successful, the state of the server is set to
Active and monitors will be deployed via load balancing again.