- FTP to the site
ftp.unicode.org.
- Change to the directory
Public/MAPPINGS
.
- Locate the character set you want, as follows:
- If the character set is a standard Microsoft Windows Code Page, change to the directory, VENDORS/MICSFT/WINDOWS
- If the character set is a standard Microsoft DOS Code Page, change to the directory, VENDORS/MICSFT/PC
- If the character set is an ISO 8859 character set, change to the directory, ISO8859.
- If the character set is none of the above, explore the other directories in ftp.unicode.org/Public/MAPPINGS in order to locate
the desired character set.
- If you cannot find the character set you want, you must define a nonstandard character set or modify an existing Relativity
character set.
- Locate the file for the desired character set and download the file to your local machine. Name it with the extension
.cs.
- Edit the file with a text editor, taking care to delete any control characters, usually marked with the characters ^D, from
the file.
- Insert the following line at the beginning of the file:
Charset "Character Set Name" 0x0
For
Character Set Name, substitute the name for the character set that is to be visible from within the
Select Character Set dialog box.
- At the end of the file, insert the following line:
EndCharset
- Examine the remaining lines for any that are missing the second entry on the line. This second entry is the 16-bit Unicode
character for the character being defined. In the definitions available on
ftp.unicode.org, unused Code Points are left blank with the comment of UNDEFINED. Insert <NOT USED> as the second entry on the line.
Note: A Unicode character may be used only once in a character set. If it appears twice, an error indicating a duplicate Unicode
character will be generated during import. Unique Unicode characters are necessary for all defined Code Points in order to
generate mapping tables that can be used to translate characters both to and from the new character set. Rather than arbitrarily
assigning Unicode characters to unused Code Points, either leave them out of the character set definition, or use <NOT USED>
as the Unicode character entry. When Relativity encounters the <NOT USED> entry, it creates an association in the character
set being defined with unused or unmatched entries in the target character set. In this manner, if an undefined character
is present in the data, it will be translated consistently.
- Save the file and import it into the Relativity data source.
To see a list of errors that can occur when a character set is imported into a Relativity data source, see
Error Messages when Importing a Character Set.