1.3.2 Failure handling

Services.The health of all services in the system are monitored.

  • If a service is found to be unhealthy, the system will automatically attempt to self-heal, generally by restarting the process.

  • Service interruptions may occur depending on the type of failure.

  • Events regarding failures detected can be viewed in the Cluster Management dashboard.

Nodes. When a cluster node becomes unavailable for any reason, whether planned or unplanned:

  • The cluster will generally move the services that had been running on that node onto other nodes.

  • It may take five minutes or more for a node to be recognized as unavailable. This delay is designed to prevent unwarranted service disruptions that could be triggered by temporary conditions, such as intermittent network issues.

NOTE:Instructions are provided for gracefully shutting down or rebooting a node and should be used any time a node is shut down or rebooted. (See the Cluster Management - Nodes help.)