Key initial classifiers for a failure are:
- The console log
- This contains an easy-to-read history of enterprise server system activity leading up to the problem. When viewing the console
log:
- Investigate messages with levels other than information first.
- Frequently, the enterprise server will continue to appear normal for some time after a failure so the 10 to 15 minutes before
the failure should be investigated in detail (including the information messages).
- The list of processes
- In the Communications Log, inspect the processes associated with the failing enterprise server (providing the server did not
shut down). Loops may be indicated if there is a significant difference in the CPU time for a process between the list in
the healthy data capture and the list in the problem data capture.
- A list of shared memory areas and semaphores
- An unusual number of objects may indicate an operating system problem. To evaluate this, compare the list with one obtained
when the enterprise server system is in a healthy state. You can obtain this list using a third-party tool such as Process
Explorer.
These classifiers coupled with the external symptoms should provide enough information to assign the failure to one of the
types of failure listed.