1 Supported_Character_Sets Oracle Rdb supports multiple character sets and lets you use more than one character set in a database. The supported character sets are: o ARABIC o BIG5 o DEC_HANYU o DEC_HANZI o DEC_KANJI o DEC_KOREAN o DEC_MCS o DEC_SICGCC o DEVANAGARI o DOS_LATIN1 o DOS_LATINUS o HANYU o HANZI o HEX o ISOLATIN1 o ISOLATIN9 o ISOLATINARABIC o ISOLATINCYRILLIC o ISOLATINGREEK o ISOLATINHEBREW o KANJI o KATAKANA o KOREAN o SHIFT_JIS o TACTIS o WIN_ARABIC o WIN_CYRILLIC o WIN_GREEK o WIN_HEBREW o WIN_LATIN1 The various ways characters can be coded are: o Single-octet A single-octet character set is entirely represented in one octet. ASCII is an example of a single-octet character set. Each ASCII character is represented in one octet. o Multi-octet A multi-octet character set is, in general, entirely represented in one or more octets. Some character sets are fixed multi-octet character sets and some are mixed multi- octet characters. - Fixed multi-octet A fixed multi-octet character set is represented by two or more fixed number of octets. Kanji is an example of a fixed multi-octet character set. Each Kanji character is represented in two octets. - Mixed multi-octet A mixed multi-octet character set is represented by one or more mixed number of octets that allow the use of ASCII and a fixed multi-octet character set in the same string. DEC_ KANJI is an example of a mixed multi-octet character set. The ASCII characters are represented in one octet, and the Kanji characters are represented in two octets. 2 Automatic_Translation During operations on text data such as assignments of a literal to a text column or the comparison of two string variables, Oracle Rdb carries out character set compatibility checks to ensure that the operation is viable. Without automatic translation being enabled this checking is quite restrictive in that in most cases the two text objects must have identical character set before the operation is allowed. The automatic translation feature allows you to choose whether the character set checking should be restrictive or whether Rdb should attempt a character set translation, similar to that provided by the TRANSLATE function prior to assignments or comparisons. With automatic translation enabled you may easily carry out operations that previously required additional translations steps to be carried out explicitly. 1. Carry out comparisons between columns that contain data encoded in different character sets that have common character subsets, for example, DEC_MCS and DEC_KANJI have ASCII in common. 2. Use the same SQL code to access database data irrespective of the client's environment. For example, a user on a Japanese PC accessing a DEC_MCS column would have to add TRANSLATE statements to the SQL commands to convert the DEC_MCS to SHIFT_JIS before they could display it on their screen. With automatic translation enabled and a Display Character set specified, this would not be required. 3. Enter data from a native interface without explicit translations. For example, users using SHIFT_JIS on a Japanenese PC may access and insert data into a DE_KANJI column in the database without explicit translations statements. You may enabled automatic translation by: 1. Using a SET AUTOMATIC TRANSLATION statement 2. Defining the SQL$AUTOMATIC_TRANSLATION logical name SQL$AUTOMATIC_TRANSLATION The logical name SQL$AUTOMATIC_TRANSLATION allows SQL users to specify that automatic translations should be enabled by default. The logical SQL$AUTOMATIC_TRANSLATION may be placed in any logical name table accessible to the client SQL process. If the logical name is set to either the string 'TRUE' or the value 'T' prior to invoking SQL, then automatic translation will be enabled by default, any other value will disable automatic translation within SQL. 2 Character_Set_HEX The character set HEX is comprised of two octet hexadecimal characters '00' through 'FF' and has the characteristic that the contents of data objects with this character set will not be automatically translated to the display character set when automatic translation has been enabled. It may be used in conjunction with the CAST and TRANSLATE functions to obtain the hexadecimal equivalence of text objects. Translation to the HEX character set will translate source data octet by octet into hexadecimal notation. Translation from the HEX character set will translate from hexadecimal notation to the destination character set. For example: SQL> show character sets Default character set is DOS_LATINUS National character set is DOS_LATINUS Identifier character set is DOS_LATINUS Literal character set is DOS_LATINUS Display character set is DOS_LATINUS Alias RDB$DBHANDLE: Identifier character set is DEC_MCS Default character set is DEC_MCS National character set is DEC_MCS SQL> show automatic translation Automatic translation: ON SQL> create tab latin (f1 char(4) char set win_latin1, cont> f2 char(4) char set dos_latinus); SQL> insert into latin value ('AÉÖ','AÉÖ'); 1 row inserted SQL> select f1, cast(f1 as char(8) char set hex), cont> f2, cast(f2 as char(8) char set hex) from latin; F1 F2 AÉÖ 41C9D620 AÉÖ 41909920 1 row selected SQL> select cast (_hex'9099' as char(2) ) from rdb$database; ÉÖ 1 row selected SQL> select translate (_hex'9099' using rdb$dos_latinus ) Cont> from rdb$database; ÉÖ 1 row selected The previous example also shows automatic translations between the literals character set DOS_LATINUS and the field F2 containing WIN_LATIN1, and the subsequent automatic translation from the F2 field back to the display character set. The hexadecimal display of the field contents shows that the actual data stored in the database is different for field f1 and f2 even though the input literals and the output displayed appears identical. 2 Default_Character_Sets The default character set is the character set that SQL uses for the following elements: o Database columns with a character data type that does not explicitly specify a character set o Parameters that are not qualified by a character set You can specify the default character set at the session and database level. See the Oracle Rdb Introduction to SQL and Oracle Rdb Guide to Database Design and Definition for more detail about session and database character sets. You can specify the database default character set only when you create the database. You cannot change the database default character set after you have created the database. SQL uses DEC_MCS as the default character set, unless you have set the dialect to MIA or specified a default character set at the session level. You can override any default character set by specifying another default character set when creating a database. To specify the default character set, use one of the character set names listed in Supported Character Sets. The default character set does not affect the setting of the currency sign. When you compile SQL programs (either SQL module language or precompiled SQL), SQL uses the following to derive the default character set: o The DEFAULT CHARACTER SET clause in the DECLARE ALIAS statement specifies the default character set of the alias at compile time. At run time, SQL uses the default character set of the attached database. At run time, you must ensure that the database default character set is identical to the default character set specified in the DECLARE ALIAS clause. o The DEFAULT CHARACTER SET clause of the SQL module header or the DECLARE MODULE statement specifies the character set for parameters that are not qualified by a character set. o In dynamic SQL, the SET DEFAULT CHARACTER SET statement specifies, at run time, the character set for parameters that are not qualified by a character set. o The RDB$CHARACTER_SET logical name. However, the logical name is deprecated and will not be supported in a future release. 2 Display_Character_Set The display character set is the character set SQL uses for determining the character set that text will automatically be translated to before display in interactive SQL or for text being returned by SQL to a user program. You can specify the display character set only for a session or a module by using the SET DISPLAY CHARACTER SET statement or the DISPLAY CHARACTER SET clause of the SQL module header, the DECLARE MODULE statement, or the DECLARE ALIAS statement. The choice of display character set is limited to those character sets that include ASCII characters. Identifier Character Set identifies the subset of character sets that you can use to specify the display character set. 2 Identifier_Character_Set The identifier character set is the character set SQL uses for database object names, such as table names and column names. You can specify the identifier character set at the session and database level. The choice of identifier character set is limited to those character sets that include ASCII characters. This is necessary so that the object names for the Oracle Rdb system metadata, which is in ASCII, can be stored. You can specify the identifier character set for the database only when you create the database. You cannot alter the identifier character set of a database after creation. Following is a list of the character sets used for the identifier character set: o ASCII o AL24UTFFSS o DEC_MCS o DOS_LATIN1 o DOS_LATINUS o DEVANAGARI o DEC_SICGCC o DEC_HANYU o DEC_HANZI o GB18030 o ISOLATINARABIC o ISOLATINCYRILLIC o ISOLATIN1 o ISOLATIN9 o ISOLATINGREEK o ISOLATINHEBREW o DEC_KANJI o KATAKANA o DEC_KOREAN o SHIFT_JIS o UTF8 o UNSPECIFIED o TACTIS o WIN_ARABIC o WIN_GREEK o WIN_CYRILLIC o WIN_HEBREW When you compile SQL programs (either SQL module language or precompiled SQL), SQL uses the following to derive the identifier character set: o The IDENTIFIER CHARACTER SET clause of the SQL module header or the DECLARE MODULE statement specifies the character set for parameters that are not qualified by a character set. o In dynamic SQL, the SET IDENTIFIER CHARACTER SET statement specifies, at run time, the character set for parameters that are not qualified by a character set. o The RDB$CHARACTER_SET logical name. However, the logical name is deprecated and will not be supported in a future release. SQL uses DEC_MCS as the identifier character set, unless you have set the dialect to MIA or specified an identifier character set at the session level. You can override any identifier character set by specifying another identifier character set when creating a database. 2 Literal_Character_Sets The literal character set is the character set SQL uses for unqualified character string literals. You can specify the literal character set only for a session or a module by using the SET LITERAL CHARACTER SET statement or the LITERAL CHARACTER SET clause of the SQL module header, the DECLARE MODULE statement, or the DECLARE ALIAS statement. When inserting data into a column, you must qualify the literal with the same character set with which you defined the column. For example, suppose that the literal character set of the module is DEC_MCS. If the column ENGLISH is defined as data type DEC_MCS, SQL returns an error when you execute the following statement: SQL> INSERT INTO COLOURS cont> (ENGLISH) cont> VALUES cont> (_DEC_KANJI'Black'); %SQL-F-INCCSASS, Incompatible character set assignment between ENGLISH and SQL> 2 National_Character_Set The national character set is a shorthand notation that you can use for a character set of your choice. SQL uses the national character set for the following elements: o For all columns and domains with the data type NCHAR or NCHAR VARYING and for the NCHAR data type in a CAST function. For information about these data types, see the Data_Types HELP topic. o For all parameters in SQL module language with the data type NCHAR or NCHAR VARYING. o For all character string literals qualified by the national character set; that is, the literal is preceded by the letter N and a single quotation mark (for example, N'). For more information, see the Literals HELP topic. You can specify the national character set at the session and database level. See the Oracle Rdb Introduction to SQL and the Oracle Rdb Guide to Database Design and Definition for more detail about session and database character sets. You specify the national character set for a database when you create the database. You cannot alter the national character set of a database. SQL uses DEC_MCS as the national character set, unless you have set the dialect to MIA or specified a national character set at the session level. You can override any national character set by specifying another national character set when creating a database. When you compile SQL programs (either SQL module language or precompiled SQL), SQL uses the following to derive the national character set: o The NATIONAL CHARACTER SET clause in the DECLARE ALIAS statement specifies the national character set of the alias at compile time. It controls the national character set for column and domain definitions and the NCHAR and NCHAR VARYING data types in a CAST function. At run time, SQL uses the national character set of the attached database for these elements. o The NATIONAL CHARACTER SET clause of the SQL module header and the DECLARE MODULE statement specifies the character set for literals qualified by the national character set and for parameters defined with the data type NCHAR or NCHAR VARYING. o In dynamic SQL, the SET NATIONAL CHARACTER SET statement specifies, at run time, the character set for columns with the data type NCHAR and NCHAR VARYING and for character string literals qualified by the national character set. o The RDB$CHARACTER_SET logical name. However, the logical name is deprecated and will not be supported in a future release. NOTE Although SQL does not require that the national character set of the database and the module match, Oracle Rdb recommends that you define both with the same character set. 2 Character_Set_ISOLATIN9 Oracle Rdb supports the ISOLATIN9 character set (as described by ISO 8859-15). ISOLATIN9 is similar to ISOLATIN1 except for 8 codepoints. The following table compares ISOLATIN9 and ISOLATIN1. Table 1 ISOLATIN1/ISOLATIN9 Character Set Differences ISO Latin 1 ISO Latin 9 Code Unicode Unicode Pos Pos Pos Hex Hex Name Hex Name A4 00A4 currency symbol 20AC euro sign A6 00A6 broken bar 0160 latin capital letter s with caron A8 00A8 diaeresis 0161 latin small letter s with caron B4 00B4 acute accent 017D latin capital letter z with caron B8 00B8 cedilla 017E latin small letter z with caron BC 00BC vulgar fraction 0152 latin capital ligature oe one quarter BD 00BD vulgar fraction 0153 latin small ligature oe one half BE 00BE vulgar fraction 0178 latin capital letter y with three quarters diaeresis 2 Oracle_NLS_Character_Set_Names Oracle Rdb supports the use of Oracle National Language Support (NLS) names as aliases for existing Oracle Rdb character sets, as summarized in the following table. You can use NLS alias names anywhere a character set name can be used. Table 2 Oracle NLS Character Set Names Supported as Aliases US7ASCII ASCII WE8DEC DEC_MCS WE8ISO8859P1 ISOLATIN1 WE8ISO8859P1 ISOLATIN9 CL8ISO8859P5 ISOLATINCYRILLIC AR8ISO8859P6 ISOLATINARABIC EL8ISO8859P7 ISOLATINGREEK IW8ISO8859P8 ISOLATINHEBREW TH8TISASCII TACTIS JA16VMS DEC_KANJI JA16SJIS SHIFT_JIS KO16KSC5601 KOREAN ZHS16CGB231280 HANZI ZH16BIG5 BIG5 JA16EUCFIXED KANJI 2 Character_Set_UNSPECIFIED Oracle Rdb supports the use of the UNSPECIFIED character set. You can make comparisons and assignments between text objects (columns, literals, and so on) that have the UNSPECIFIED character set, and any other text object regardless of the character set of the other text object. The characteristics of the UNSPECIFIED character set are as follows: o The character set ID is 32767. o It can be used to specify any session or database character set, including the identifier character set. o It is a single-octet character set (fixed). o It applies casing (uppercase and lowercase) only to ASCII characters. o It contains ASCII, as follows: - The space character is the ASCII space character (0x20). - The wildcard character is the ASCII underscore (0x5f). - The string wildcard is the ASCII percent (0x25). 2 Logical_Names_for_Character_Sets You can define a logical name for a character set name. Doing so allows easy portability of applications across national boundaries. You can use this logical name or parameter anywhere you use a character set name in SQL. SQL translates the logical name or parameter at compile time for precompiled SQL and SQL module language, or at run time for dynamic SQL and interactive SQL. The logical name can begin with any of the following: o RDBVMS$ o RDB$ o SQL$ Oracle Rdb recommends that you begin logical names with RDB$.