The XML PARSE statement parses an XML document so that it can be processed by the COBOL program. It is an implementation of the IBM Enterprise COBOL verb of the same name and is provided to simplify IBM migrations; however, any customer wishing to read XML data can use this verb.
XML PARSE is similar to C$XML in that you parse (read) the XML data and move it into the appropriate working storage item. The difference is that with C$XML, if you know that the data lies in a certain element or attribute, you can retrieve that attribute directly. With XML PARSE, you set up a processing procedure so that when you encounter a new element or attribute, you can specify how and where you want to store that data.
See Working with Non-Vision Data in A Guide to Interoperating with ACUCOBOL-GT for additional information on working with XML data.
XML PARSE identifier-1 PROCESSING PROCEDURE [IS] procedure-name-1 [THROUGH procedure-name-2] THRU [[ON] EXCEPTION imperative-statement-1] [NOT [ON] EXCEPTION imperative-statement-2] [END-XML]
This procedure is the same as if the COBOL program executed the PERFORM verb on the same paragraph(s).
If there are two or more logical paths to the return point, then procedure-name-2 can name a paragraph that consists of only an EXIT statement; all the paths to the return point must then lead to this paragraph.
The range of the processing procedure must not cause any GOBACK or EXIT PROGRAM statement to be executed, except to return control from a program to which control was passed by a CALL statement that is executed in the range of the processing procedure.
The range of the processing procedure must not cause an XML PARSE statement to be executed, unless the XML PARSE statement is executed in an outermost program to which control was passed by a CALL statement that is executed in the range of the processing procedure.
A program executing on multiple threads can execute the same XML statement or different XML statements simultaneously. However, the compiler generates LOCK THREAD / UNLOCK THREAD statements immediately before or after the XML PARSE statement, so effectively only a single thread is executing during the entire execution of the XML PARSE.
The processing procedure can terminate the run unit with a STOP RUN statement.
For more details about the processing procedure, see Control Flow.
An exception condition exists when the XML parser detects an error while processing an XML document. The parser first signals the exception by passing control to the processing procedure with special register XML-EVENT containing the word, 'EXCEPTION'. The parser also provides a numeric error code in special register XML-CODE. Error codes are listed in the special register section.
An exception condition also exists when the processing procedure sets XML-CODE to -1 before returning to the parser for a normal XML event. This is done by the user to deliberately terminate parsing. In this case, the parser does not signal an XML exception event. If the ON EXCEPTION phrase is specified, control is transferred to imperative-statement-1. If it is not specified, NOT ON EXCEPTION phrases are ignored, and control is transferred to the end of the XML PARSE statement. Special register XML-CODE contains the numeric error code for the XML exception or -1 after execution of the XML PARSE statement.
If the processing procedure handles the XML exception event and sets XML-CODE to zero before returning control to the parser, the exception condition no longer exists. If no other unhandled exceptions occur prior to the termination of the parser, control is transferred to imperative-statement-2 of the NOT ON EXCEPTION phrase, if specified.
When no exception conditions exist, control is transferred to imperative-statement-2, if specified, or to the end of the XML PARSE statement. If an ON EXCEPTION phrase is specified, it is ignored. Special register XML-CODE contains a zero after the XML PARSE statement has finished executing.
The scope of a conditional XML PARSE statement is terminated by:
Parsing XML documents one segment at a time. You can parse XML documents by passing one segment (or record) of XML text at a time. Processing very large documents, or processing XML documents that reside in a file, are two possible major applications.
One can parse an XML document a segment at a time by initializing the parse data item to the first segment of the XML document, and then executing the XML PARSE statement. The parser processes the XML text and returns XML events to your processing procedure as usual.
At the end of the text segment, the parser signals an END-OF-INPUT XML event, with XML-CODE set to zero. If there is another segment of the document to process, in your processing procedure move the next segment of XML data to the parse data item, set XML-CODE to one, and return to the parser. To signal the end of XML segments to the parser, return to the parser with XML-CODE still set to zero.
The length of the parse data item is evaluated for each segment, and determines the segment length.
Variable-length segments. If the XML document segments are variable length, specify a variable-length item for the parse data item. For example, for variable-length XML segments, you can define the parse data item as one of the following items:
When a given XML PARSE statement appears as imperative-statement-1 or imperative-statement-2, or as part of imperative-statement-1 or imperative-statement-2 of another XML PARSE statement, that given XML PARSE statement is a nested XML PARSE statement.
Nested XML PARSE statements are considered to be matched XML PARSE and END-XML combinations proceeding from left to right. For this reason, when END-XML phrases are encountered, they are matched with the nearest preceding XML PARSE statements that have not already been terminated.
When the XML parser receives control from an XML PARSE statement, it analyzes the XML document and transfers control to procedure-name-1 at the following points:
Control returns to the XML parser when the end of the processing procedure is reached.
The exchange of control between the parser and the processing procedure continues until either:
Then, the parser terminates and returns control to the XML PARSE statement with the XML-CODE special register containing the most recent value set by the parser or the processing procedure.
The XML-CODE, XML-EVENT, and XML-TEXT special registers contain information about each XML event passed to the processing procedure. The content of XML-CODE is defined during and after execution of an XML PARSE statement. The contents of all other XML special registers are undefined outside the range of the processing procedure.
For normal XML events, XML-CODE contains zero when the processing procedure receives control. For exception events, XML-CODE contains one of the exception codes specified later in this document. XML-EVENT is set to the event name, such as START-OF-DOCUMENT. XML-TEXT contains the piece of the document corresponding to the event, as described in XML-EVENT. For more information about the XML special registers, see "Special Registers" below.
For all kinds of XML events, if XML-CODE is not zero when the processing procedure returns control to the parser, the parser terminates without a further EXCEPTION event. Setting XML-CODE to -1 before returning to the parser for an event other than EXCEPTION forces the parser to terminate with a user-initiated exception condition. For some EXCEPTION events, the processing procedure can handle the event, then set XML-CODE to zero to force the parser to continue, although subsequent results are unpredictable. When XML-CODE is zero, parsing continues until the entire XML document has been parsed or an exception condition occurs.
When used in the XML PARSE statement, the XML-CODE special register is used to communicate status between the XML parser and the processing procedure.
For each event, the XML parser sets XML-CODE before transferring control to the processing procedure. It also does this at parser termination. You can reset XML-CODE before returning control to the parser.
The XML-CODE special register has the implicit definition:
01 XML-CODE PICTURE S9(9) USAGE BINARY VALUE 0.
When the XML parser encounters an XML event, it sets XML-CODE and then passes control to the processing procedure. For all events except EXCEPTION, XML-CODE contains zero when the processing procedure receives control.
For an EXCEPTION event, the parser sets XML-CODE to an exception code that indicates the nature of the exception. Exception codes are listed below. Note that these are different than IBM COBOL's exception codes.
XML PARSE Exception Code | Description |
---|---|
101 | Out of memory |
102 | Syntax error in XML |
103 | No elements |
104 | Invalid token |
105 | Unclosed token |
106 | Partial character |
107 | Tag mismatch |
108 | Duplicate attribute |
109 | Junk after the doc element |
110 | Error in the parameter entity reference |
111 | Undefined entity |
112 | Recursive entity reference |
113 | Asynchronous entity |
114 | Bad character reference |
115 | Binary entity reference |
116 | Attribute external entity reference |
117 | Misplaced XML processing instructions |
118 | Unknown encoding |
119 | Incorrect encoding |
120 | Unclosed cdata section |
121 | External entity handling required |
122 | Not standalone |
123 | unexpected error |
124 | entity declared in wrong place |
If you want the parser to terminate after normal events without causing an EXCEPTION, set XML-CODE to -1 before returning control to the parser. If you set XML-CODE to any other value, results are undefined. IBM customers should note that ACUCOBOL-GT ignores XML-CODEs of 0. This is because unlike the IBM COBOL parser, there are no exceptions that allow continuation of parsing in ACUCOBOL-GT. Our XML parser cannot continue once it has detected an error.
In ACUCOBOL-GT, no further events are returned from the parser. Control is passed to the statement that you specify in the ON EXCEPTION phrase, or to the end of the XML PARSE statement if you did not code an ON EXCEPTION phrase.
When the parser returns control to the XML PARSE statement, XML-CODE contains the most recent value set either by the parser or by the processing procedure.
XML-EVENTThe XML parser uses the XML-EVENT special register to communicate event information to the processing procedure. The information that is communicated is identified in the XML PARSE statement. Before passing control to the processing procedure, the XML parser sets XML-EVENT to the name of the XML event, as described in Table 1 at the end of this topic.
XML-EVENT has the implicit definition:
01 XML-EVENT USAGE DISPLAY PICTURE X(30) VALUE SPACE.
XML-EVENT cannot be used as a receiving data item.
XML-TEXTThe XML-TEXT special register is defined during XML parsing to contain document fragments that are of class alphanumeric. XML-TEXT is an elementary alphanumeric data item of the length of the contained XML document fragment. The length of XML-TEXT can vary from 0 through 16,777,215 bytes. There is no equivalent COBOL data description entry.
The parser sets XML-TEXT to the document fragment associated with an event before transferring control to the processing procedure when the operand of the XML PARSE statement is an alphanumeric data item.
Use the LENGTH function for XML-TEXT to determine the number of bytes that XML-TEXT contains.
XML-TEXT cannot be used as a receiving item.
XML event (content of XML-EVENT) | Content of XML-TEXT |
---|---|
ATTRIBUTE-CHARACTERS | The value within quotes or apostrophes. If the value includes an entity reference, this can be a substring of the attribute value. |
ATTRIBUTE-NAME | The attribute name; the string to the left of "=". |
COMMENT | The text of the comment between the opening character sequence "<!--" and the closing character sequence "-->". |
CONTENT-CHARACTER | The single character corresponding with the predefined entity reference in the element content. |
CONTENT-CHARACTERS | The element content between start and end tags. This can be a substring of the element content if the content contains an entity reference or another element. |
DOCUMENT-TYPE-DECLARATION | The entire document type declaration including the opening and closing character sequences, "<!DOCTYPE" and ">". |
ENCODING-DECLARATION | The value, between quotes or apostrophes, of the encoding declaration in the XML declaration. |
END-OF-CDATA-SECTION | Always contains the string "]]>". |
END-OF-DOCUMENT | Null, zero-length. |
END-OF-ELEMENT | The name of the end element tag or empty element tag. |
EXCEPTION | The part of the document successfully scanned, up to and including the point at which the exception was detected. Special register XML-CODE contains the unique error code identifying the exception. |
PROCESSING-INSTRUCTION-DATA | The rest of the processing instruction, not including the closing sequence, "?>", but including trailing, not leading, white space characters. |
PROCESSING-INSTRUCTION-TARGET | The processing instruction target name that occurs immediately after the processing instruction opening sequence, "<?". |
STANDALONE-DECLARATION | The value between quotes or apostrophes of the stand-alone declaration in the XML declaration |
START-OF-CDATA-SECTION | Always contains the string "<![CDATA[". |
START-OF-DOCUMENT | The entire document. |
START-OF-ELEMENT | The name of the start element tag or empty element tag, also known as the element type. |
VERSION-INFORMATION | The value between quotes or apostrophes of the version declaration in the XML declaration. |