The XML PARSE statement is used to interface with an XML parser that is part of the COBOL run-time system. The XML PARSE statement parses an XML document into its individual pieces and passes each piece, one at a time, to a user-written processing procedure.
General Format
Syntax Rules
- Identifier-1 must be an alphanumeric or national data item, cannot be a function-identifier, and should contain the XML document character stream. If identifier-1 is a national data item, its contents must be encoded using CCSID 1200 (Unicode UTF-16). It must not contain any character entities that are represented using multiple encoding units. Use a character reference, for example "" or "", to represent any such characters.
- Procedure-name-1 specifies the first or only section or paragraph in the processing procedure
- Procedure-name-2 specifies the last section or paragraph in the processing procedure.
- codepage must be an unsigned integer data item or an unsigned integer literal that represents a valid coded character set identifier (CCSID).
- If identifier-1 references a data item of category national, codepage must specify CCSID 1200, for Unicode UTF-16.
- If identifier-1 references a data item of category alphanumeric, codepage must specify CCSID 1208 for UTF-8 or one of the supported values, namely 1140 or 1047.
- identifier-2 must be of category alphanumeric and cannot be a function identifier.
- xml-schema-name-1 must be defined in the XML-SCHEMA clause of the SPECIAL-NAMES paragraph.
- The RETURNING NATIONAL phrase can be specified only when the XMLPARSE(XMLSS) compiler option is in effect.
- The VALIDATING phrase can be specified only when the XMLPARSE(XMLSS) compiler option is in effect.
- The ENCODING phrase can be specified only when the XMLPARSE(XMLSS) compiler option is in effect.
General Rules
- The PROCESSING PROCEDURE phrase specifies the name of a procedure to handle the various events that the XML parser generates. The processing procedure consists of the statements that handle XML events. The range of the processing procedure also includes all statements executed by CALL, EXIT, GO TO, GOBACK, INVOKE, and PERFORM statements in the processing procedure. The Compiler inserts a return mechanism after the last statement in the processing procedure. The processing procedure can terminate the run unit with a STOP RUN statement. It must not attempt to return to the parser with a GOBACK or EXIT PROGRAM statement.
- The ENCODING phrase specifies an encoding that is assumed for the source XML document in identifier-1.
- When identifier-1 references a national data item, XML document fragments are always returned in Unicode UTF-16 representation in the national special registers XML-NTEXT, XML-NNAMESPACE, and XML-NNAMESPACE-PREFIX.
- When the RETURNING NATIONAL phrase is specified and identifier-1 references a data item of category alphanumeric, XML document fragments are automatically converted to Unicode UTF-16 representation and returned to the processing procedure in the national special registers XML-NTEXT, XML-NNAMESPACE, and XML-NNAMESPACE-PREFIX.
- When the RETURNING NATIONAL phrase is not specified and identifier-1 references a data item of category alphanumeric, the XML document fragments are returned to the processing procedure in the alphanumeric special registers XML-TEXT, XML-NAMESPACE, and XML-NAMESPACE-PREFIX except that: text for the ATTRIBUTE-NATIONAL-CHARACTER and CONTENT-NATIONAL-CHARACTER XML events is always returned in special register XML-NTEXT.
- The VALIDATING phrase specifies that the parser should validate the XML document against an XML schema while parsing it. The schema used for XML validation needs to be in its unprocessed text format..
- If the FILE keyword is not specified, identifier-2 must reference a data item that contains the XML schema.
- If the FILE keyword is specified, xml-schema-name-1 identifies an existing file that contains the XML schema.
During parsing with validation, normal XML events are returned as for non-validating parsing until an exception occurs due to a validation error or other error in the document.
When an XML document is not valid, the parser signals an XML exception and passes control to the processing procedure with special register XML-EVENT containing 'EXCEPTION' and special-register XML-CODE containing return code 24 in the high-order halfword and a reason code in the low-order halfword.
- The ON EXCEPTION phrase specifies imperative statements that are executed when the XML PARSE statement raises an exception condition. An exception condition occurs when the XML parser detects an error in processing the XML document. The parser first signals an exception XML event by passing control to the processing procedure with special register XML-EVENT set to contain "EXCEPTION". The parser provides a numeric error code in special register XML-CODE.
An exception condition also occurs if the processing procedure deliberately terminates parsing by setting XML-CODE to -1 before returning to the parser from any normal XML event. In this case, the parser does not signal an EXCEPTION XML event. The following applies:
- If the ON EXCEPTION phrase is specified, the parser then transfers control to imperative-statement-1.
- If the ON EXCEPTION phrase is not specified, the NOT ON EXCEPTION phrase, if any, is ignored, and control is transferred to the end of the XML PARSE statement.
- If the XML processing procedure handles the exception XML event and sets XML-CODE to zero before returning control to the parser, the exception condition no longer exists.
- If no other unhandled exceptions occur before the termination of the parser, control is transferred to imperative-statement-2 of the NOT ON EXCEPTION phrase, if specified.
- The NOT ON EXCEPTION phrase specifies imperative statements that are executed when no exception condition exists at the termination of XML PARSE processing. The following applies:
- If an exception condition does not exist at termination of XML PARSE processing, control is transferred to imperative-statement-2 of the NOT ON EXCEPTION phrase, if specified.
- If the NOT ON EXCEPTION phrase is not specified, control is transferred to the end of the XML PARSE statement. In this case, ON EXCEPTION, if specified, is ignored. Special register XML-CODE contains zero after execution of the XML PARSE statement.
-
When XMLPARSE"COMPAT" is set and when running under native COBOL, XML character entities, for example
', are output as individual events. When XMLPARSE"XMLSS" is set or when running under managed COBOL, XML entities are converted to the single character in which they represent and included in the general text that surrounds them, generating a single event.