Understanding Oracle Character Sets and NLS_LANG: A Complete Guide
Oracle character sets define how byte data maps to symbols, with NLS_LANG controlling client language, territory, and charset; this guide explains the components, naming conventions, differences between database and client sets, export/import conversion processes, and methods to view or modify character sets safely.
What Is an Oracle Character Set
Oracle character sets are collections of symbols that map byte data to characters. They support localization of messages, sorting, dates, currencies, numbers, and calendars. The most important parameter influencing the database character set is NLS_LANG, which follows the format language_territory.charset.
Components of NLS_LANG
Language : Determines the language of server messages.
Territory : Sets default date, number, and currency formats.
Charset : Specifies the character set used by the client.
Only the charset part truly affects the database character set; matching this part between two databases allows data import/export.
Character Set Naming Rules
Oracle character set names follow the pattern <language><bits><encoding>. For example, ZHS16GBK denotes a 16‑bit (two‑byte) Simplified Chinese GBK encoding.
Types of Character Sets
Single‑byte encodings : US7ASCII (7‑bit, 128 characters) and WE8ISO8859P1 (8‑bit, 256 characters for Western Europe).
Variable‑length multibyte encodings : AL32UTF8 (Unicode, variable length) and ZHS16CGB231280 (Chinese).
Fixed‑length multibyte encodings : AF16UTF16 (Unicode UTF‑16, fixed 2‑byte per character).
Unicode encodings : UTF‑8 (variable 1‑3 bytes) and UTF‑16 (fixed 2 bytes).
Database vs National Character Set
The database character set (specified at creation) stores CHAR, VARCHAR2, CLOB, LONG and SQL/PLSQL objects. The national character set stores NCHAR, NVARCHAR2, NCLOB and is required for Asian language support. In Oracle 9i and later, the national set can only be AF16UTF16 or UTF8, defaulting to AF16UTF16.
Querying Character Set Parameters
You can view character set settings via data dictionary views: nls_database_parameters – shows the database character set. nls_instance_parameters – shows instance‑level settings (from init.ora, environment, or registry). nls_session_parameters – shows the current session’s settings. v$NLS_PARAMETERS – displays current NLS values.
Modifying the Database Character Set
Changing the database character set after creation is generally discouraged. If necessary, two approaches exist:
Export the data, recreate the database with the desired character set, and import the data.
Use ALTER DATABASE CHARACTER SET when the new charset is a superset of the current one (e.g., ALTER DATABASE CHARACTER SET UTF8 because UTF8 is a superset of US7ASCII).
When modifying the national character set, the INTERNAL_USE keyword may be required, and the operation fails if NCLOB data exists.
Client Character Set (NLS_LANG)
The client character set is defined by the NLS_LANG environment variable. Its format is language_territory.charset, for example AMERICAN_AMERICA.US7ASCII. Setting this variable determines how the client encodes and decodes character data.
Examples:
UNIX: export NLS_LANG="simplified chinese"_china.zhs16gbk
Windows: set NLS_LANG=AMERICAN_AMERICA.ZHS16GBKExport/Import and Character Set Conversion
During exp, if the source database charset differs from the client charset, Oracle converts data and records the client charset ID in the dump file header. Mismatched charsets can cause data loss (e.g., Chinese characters become "??" when converting from ZHS16GBK to US7ASCII).
Import involves two possible conversions:
Dump file charset ↔ import session charset.
Import session charset ↔ target database charset.
Ensuring the dump file charset matches the import session charset (or is a superset) avoids loss.
Viewing the Server Character Set
SQL> SELECT userenv('language') FROM dual;Typical output: SIMPLIFIED CHINESE_CHINA.ZHS16GBK.
Determining a Dump File’s Charset
For small dump files, open them in a hex editor and read bytes 2‑3. For large files, use a Unix command:
cat exp.dmp | od -x | head -1 | awk '{print $2 $3}' | cut -c 3-6Then map the hex value to a charset with:
SELECT nls_charset_name(to_number('0354','xxxx')) FROM dual;Changing a Dump File’s Charset
Modify bytes 2‑3 of the dump file to the desired charset ID (e.g., 0354 for ZHS16GBK). This can be done with a hex editor or a custom program for large files.
Client Charset Configuration Examples
UNIX: $NLS_LANG="simplified chinese"_china.zhs16gbk and export the variable.
Windows: edit the registry under HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE-HOME or set the environment variable set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK.
Key Takeaways
Match the charset component of NLS_LANG between source and target databases for seamless data migration.
Use ALTER DATABASE CHARACTER SET only when the new charset is a superset of the current one.
Always verify server, client, and dump file charsets before export/import operations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
