Failures fall into the following categories:
- Incorrect output (an application or system function produces the wrong results or exhibits unexpected behaviors). These include application abends.
- Unexpected system terminations. Either the whole enterprise server terminates or a subsystem (support process) terminates.
- System loops. These include application loops and enterprise server subsystem loops. They are characterized by an abnormal accumulation of CPU time by an enterprise server process and may display the same external symptoms as a system hang.
- System hangs. Any problem where the enterprise server appears to stop responding to its clients (except for loops).
Key initial classifiers for a failure are:
- The console log. This contains an easy-to-read history of enterprise server system activity leading up to the problem. Messages other than information level messages should be investigated first. Frequently, the enterprise server will continue to appear normal for some time after a failure so the 10 to 15 minutes before the failure should be investigated in detail (including the information messages).
- The list of processes. Inspect the processes associated with the failing enterprise server (if the server did not shut down). Loops may be indicated if there is a significant difference in the CPU time for a process between the list in the healthy data capture and the list in the problem data capture.
- The list of shared memory areas and semaphores. An unusual number of objects may indicate an operating system problem. To evaluate this, compare the list with one obtained when the enterprise server system is in a healthy state.
These classifiers coupled with the external symptoms should provide enough information to assign the failure to one of the types of failure listed.