Mastering Oracle Character Sets and NLS_LANG: A Complete Guide
This article explains Oracle character sets, the NLS_LANG parameter format, the differences between database, national, and client character sets, how to query and modify them, and the impact on EXP/IMP data migration, providing practical commands and step‑by‑step procedures.
What Is an Oracle Character Set?
Oracle character sets define how byte data is interpreted as symbols; they vary in size and can be mutually inclusive. The most important parameter influencing the character set is NLS_LANG, which follows the format language_territory.charset.
Components of NLS_LANG
Language : Determines the language of server messages (e.g., Chinese or English).
Territory : Sets default date, number, and currency formats.
Charset : Specifies the actual character set used.
Only the charset component truly affects the database character set.
Oracle Character Set Naming Rules
Names follow the pattern <language><bit‑width><encoding>. For example, ZHS16GBK denotes a 16‑bit GBK encoding for Simplified Chinese.
Types of Encodings
Single‑byte encodings : 7‑bit (e.g., US7ASCII) or 8‑bit (e.g., WE8ISO8859P1).
Variable‑length multibyte encodings : Used for Asian languages, such as AL32UTF8 or ZHS16CGB231280.
Fixed‑length multibyte encodings : Oracle supports only AF16UTF16 for national character sets.
Unicode encodings : UTF‑16 (fixed‑length) and UTF‑8 (variable‑length).
Superset / Subset Relationships
If one character set contains all the code points of another and maps them identically, it is a superset. Oracle documentation provides subset‑superset tables (e.g., WE8ISO8859P1 is a subset of WE8MSWIN1252).
Database vs. National vs. Client Character Sets
Database character set : Defined at database creation (CHARACTER SET) and usually cannot be changed later.
National character set : Used for NCHAR, NVARCHAR2, NCLOB; limited to Unicode encodings ( AF16UTF16 or UTF8).
Client character set : Set via NLS_LANG on the client side (e.g., AMERICAN_AMERICA.US7ASCII).
Querying Character Set Parameters
Relevant data dictionary views include: nls_database_parameters, props$, v$nls_parameters – show the database character set. nls_session_parameters – shows parameters derived from NLS_LANG or ALTER SESSION. nls_instance_parameters – reflects init.ora settings, environment variables, or registry values.
Modifying the Database Character Set
Although changing the character set after creation is generally discouraged, two approaches exist:
Export the data, recreate the database with the desired character set, and import the data.
Use ALTER DATABASE CHARACTER SET when the new charset is a superset of the current one (e.g., ALTER DATABASE CHARACTER SET UTF8).
Client Character Set Configuration
UNIX : Set NLS_LANG="simplified chinese"_china.zhs16gbk and export the variable.
Windows : Edit the registry under HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE-HOME or set the environment variable with set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK.
EXP/IMP and Character Set Conversion
During export/import, four character sets are involved: source DB charset, export session charset (via NLS_LANG), import session charset (via NLS_LANG), and target DB charset. Mismatches cause conversion and possible data loss.
Export stores the session charset ID in the dump file header. If the source charset is ZHS16GBK and the export session charset is US7ASCII, Chinese characters are lost.
Import reads the dump file charset ID, compares it with the import session charset, and may perform two conversions: dump‑file‑to‑session and session‑to‑target‑DB.
Checking Dump File Charset
For small dump files, open the file in a hex editor and read bytes 2‑3 (e.g., 0354). For large files, run:
cat exp.dmp | od -x | head -1 | awk '{print $2 $3}' | cut -c 3-6Then query the charset with:
SELECT nls_charset_name(to_number('0354','xxxx')) FROM dual;Modifying Dump File Charset
Change bytes 2‑3 to the desired hexadecimal code (e.g., 0354 for ZHS16GBK) using a hex editor or a custom program. This works when the target charset is a superset of the original; otherwise, results are unpredictable.
Important Notes
Changing the database charset without a superset relationship can corrupt data.
If ORA-12717 occurs (cannot alter national charset when NCLOB data exists), either use the INTERNAL_USE keyword or recreate the database.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
