Previous Topic Next topic Print topic


Handling "Out of Space" Errors

Unlike the mainframe, where constraints on file size are dictated by space allocation units (blocks, tracks, and cylinders), this JES system does not use the SPACE parameter when allocating files. Instead, it uses native file I/O methods, which behave differently. As such, your jobs may experience "out of space" errors, which are not the same as D37, E37, or B37 mainframe system abends.

What is an "out of space" error?

An "out of space" error is generated when data cannot be written to a file because there is no more space on the disk. When this error occurs, the file is in a corrupt state, since there is no way of knowing what data is missing from the file. This situation requires operator intervention to allow work to continue.

When these errors occur, they can have a severe impact upon a running system, as one error is likely to lead to many more, not only in files used by running batch jobs but also in those used by the system. For example, the system tries to report the error, which includes writing to a log file, whereupon that action experiences the same error. The whole system can, very quickly, become untenable and unable to continue.

How JES handles "out of space" errors

This JES system monitors the job exit code for an "out of space" error. If such a condition occurs, it frees the lock on the file, thereby permitting the end of step processing to take place, which may include removal of the file depending on the file's disposition. Subsequently, the system writes an error message to the console, and awaits operator intervention. Following the operator reply, the SEP is terminated at the end of that job, and a new SEP started.

This processing applies only to MVS files, that is those that have been defined with a DD statement in the JCL, or via SVC99. The suspension of job processing to await user input takes place only if the final job condition code is RTS error 9/007. The text of this message is displayed as "COND CODE RTS0007" in the job log. If the job ends with any other condition code, it is assumed that the job has included additional steps to process this error, or that a user exit has done so, and set the final condition code. See Using JCL User Exits for more information.

See below for sample messages, which also appear on the ES Console.

Display

An example of the spool messages for a job that produced an RTS 9/007 'out of space' error.

How to resolve "out of space" errors

You need to open the job spool view in a browser window, select the failed job on the list of Active jobs, and enter any character in the Reply field to continue processing. See To Reply to an ACCEPT FROM CONSOLE Statement for more information. To determine the file that caused the problem, click JESYSMSG under DD Name to view the system messages. You then need to carry out any remedial action necessary to correct the lack of disk space. This action might be one or more of the following:
  • Delete old files
  • Clean up temporary files
  • Archive files that have not been used for a long time
  • Move files to another disk
  • Extend the disk partition
Finally, you need to decide whether the failed job needs to be re-run.

Avoiding "out of space" errors

A well-managed system will include monitoring for free disk space to help ensure that "out of space" errors do not occur. There are other options the System Administrator can use to help reduce the impact of such errors. For example, you might wish to keep the system files on a different disk from those that are used in batch jobs. Alternatively, if certain jobs are known to produce large output files, then you need to keep those files on a different disk from other files.

Previous Topic Next topic Print topic