COBOL and Character Encoding

XML Extensions uses UTF-8 character encoding for exporting XML documents. (UTF-8 is a byte-oriented encoding form of Unicode that has been designed for ease-of-use with existing ASCII-based systems.) Imported documents are interpreted according to the character encoding specified in the XML header, resulting in an internal Unicode representation of the characters. Because XML is Unicode-based and ACUCOBOL-GT is not, a transcoding is generally required when moving character data between COBOL and XML. XML Extensions supports various means of specifying the transcoding that should occur in these cases. The following sections have related information regarding character encoding considerations.

The COBOL_CHARACTER_SET runtime configuration variable is used to determine the local character encoding on Windows.

The A_XMLIF_ENCODING runtime configuration variable is used to specify the local character encoding on UNIX. This environment variable is ignored if the XML SET ENCODING statement sets the encoding to UTF-8.

Windows Character Encoding

Under Windows, the runtime uses OEM or ANSI character encoding. Therefore, the Windows implementation also supports OEM or ANSI character encoding for local character encoding. The COBOL_CHARACTER_SET runtime configuration variable is used to determine ANSI or OEM encoding (other values are ignored, and the data is returned in UTF-8 encoding). (The A_XMLIF_ENCODING runtime configuration variable is ignored by the Windows implementation of XML Extensions.)

Note: Microsoft originally introduced OEM character encoding for MS-DOS. While there are multiple OEM code pages in use, the Windows operating system provides interfaces that allow conversion between the OEM code page in use and Unicode. XML Extensions does not need to differentiate between OEM code pages.

The ANSI code page is the default local character set of the runtime, in which case, XML Extensions uses the ANSI code page in use for the conversion to/from Unicode when using the local character encoding.

UNIX Character Encoding

On UNIX systems, the runtime is normally not concerned with the data encoding used by the underlying operating system. However, Latin-1 (ISO-8859-1) is important for the U.S. and Latin-9 (ISO-8859-15) is significant for Western Europe because it contains the Euro currency symbol.

The A_XMLIF_ENCODING runtime configuration variable may specify the built-in and predefined values of XMLIF_LATIN_1 and XMLIF_LATIN_9. These values are used to designate that either Latin-1 or Latin-9 is being used as the local character encoding. Internal translation functions convert between either Latin-1 or Latin-9 (in COBOL memory) and UTF-8 (in the XML document). The value of the environment variable is case insensitive, with hyphen and underscore characters being optional. For example, XMLIF_LATIN_9, xmlif-Latin-9, and xmliflatin9 are equivalent.

If the value of the A_XMLIF_ENCODING runtime configuration variable is not specified, then XMLIF_LATIN_9 is used as the default.

If the value of the A_XMLIF_ENCODING runtime configuration variable is specified with a value that is neither equivalent to XMLIF_LATIN_1 nor XMLIF_LATIN_9, then the value that is passed must be a name recognized by the iconv library. The iconv library can perform other conversions. In this case, the spelling may need to be exact (for example, the value may be case sensitive, and hyphens and underscores would be required). The exact spelling of the value of the A_XMLIF_ENCODINGruntime configuration variable is specific to the iconv library on the platform in use.

Note: An iconv library is not provided. The developer must acquire an appropriate package.

The value of the A_XMLIF_ICONV_NAME runtime configuration variable, if one is defined, is used to locate the iconv library (which must be a shared object) on the local system. For example:

 A_XMLIF_ICONV_NAME=/usr/local/bin/libiconv.so

If the A_XMLIF_ICONV_NAME runtime configuration variable is not set, then the PATH environment variable is searched for either of the specific names, iconv.so or libiconv.so (in that order).