Databases 17 min read

Understanding Oracle Character Sets and NLS_LANG: A Complete Guide

Oracle character sets define how byte data maps to symbols, with NLS_LANG controlling client language, territory, and charset; this guide explains the components, naming conventions, differences between database and client sets, export/import conversion processes, and methods to view or modify character sets safely.

ITPUB

Jun 10, 2017

Understanding Oracle Character Sets and NLS_LANG: A Complete Guide

What Is an Oracle Character Set

Oracle character sets are collections of symbols that map byte data to characters. They support localization of messages, sorting, dates, currencies, numbers, and calendars. The most important parameter influencing the database character set is NLS_LANG, which follows the format language_territory.charset.

Components of NLS_LANG

Language : Determines the language of server messages.

Territory : Sets default date, number, and currency formats.

Charset : Specifies the character set used by the client.

Only the charset part truly affects the database character set; matching this part between two databases allows data import/export.

Character Set Naming Rules

Oracle character set names follow the pattern <language><bits><encoding>. For example, ZHS16GBK denotes a 16‑bit (two‑byte) Simplified Chinese GBK encoding.

Types of Character Sets

Single‑byte encodings : US7ASCII (7‑bit, 128 characters) and WE8ISO8859P1 (8‑bit, 256 characters for Western Europe).

Variable‑length multibyte encodings : AL32UTF8 (Unicode, variable length) and ZHS16CGB231280 (Chinese).

Fixed‑length multibyte encodings : AF16UTF16 (Unicode UTF‑16, fixed 2‑byte per character).

Unicode encodings : UTF‑8 (variable 1‑3 bytes) and UTF‑16 (fixed 2 bytes).

Database vs National Character Set

The database character set (specified at creation) stores CHAR, VARCHAR2, CLOB, LONG and SQL/PLSQL objects. The national character set stores NCHAR, NVARCHAR2, NCLOB and is required for Asian language support. In Oracle 9i and later, the national set can only be AF16UTF16 or UTF8, defaulting to AF16UTF16.

Querying Character Set Parameters

You can view character set settings via data dictionary views: nls_database_parameters – shows the database character set. nls_instance_parameters – shows instance‑level settings (from init.ora, environment, or registry). nls_session_parameters – shows the current session’s settings. v$NLS_PARAMETERS – displays current NLS values.

Modifying the Database Character Set

Changing the database character set after creation is generally discouraged. If necessary, two approaches exist:

Export the data, recreate the database with the desired character set, and import the data.

Use ALTER DATABASE CHARACTER SET when the new charset is a superset of the current one (e.g., ALTER DATABASE CHARACTER SET UTF8 because UTF8 is a superset of US7ASCII).

When modifying the national character set, the INTERNAL_USE keyword may be required, and the operation fails if NCLOB data exists.

Client Character Set (NLS_LANG)

The client character set is defined by the NLS_LANG environment variable. Its format is language_territory.charset, for example AMERICAN_AMERICA.US7ASCII. Setting this variable determines how the client encodes and decodes character data.

Examples:

UNIX: export NLS_LANG="simplified chinese"_china.zhs16gbk
Windows: set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK

Export/Import and Character Set Conversion

During exp, if the source database charset differs from the client charset, Oracle converts data and records the client charset ID in the dump file header. Mismatched charsets can cause data loss (e.g., Chinese characters become "??" when converting from ZHS16GBK to US7ASCII).

Import involves two possible conversions:

Dump file charset ↔ import session charset.

Import session charset ↔ target database charset.

Ensuring the dump file charset matches the import session charset (or is a superset) avoids loss.

Viewing the Server Character Set

SQL> SELECT userenv('language') FROM dual;

Typical output: SIMPLIFIED CHINESE_CHINA.ZHS16GBK.

Determining a Dump File’s Charset

For small dump files, open them in a hex editor and read bytes 2‑3. For large files, use a Unix command:

cat exp.dmp | od -x | head -1 | awk '{print $2 $3}' | cut -c 3-6

Then map the hex value to a charset with:

SELECT nls_charset_name(to_number('0354','xxxx')) FROM dual;

Changing a Dump File’s Charset

Modify bytes 2‑3 of the dump file to the desired charset ID (e.g., 0354 for ZHS16GBK). This can be done with a hex editor or a custom program for large files.

Client Charset Configuration Examples

UNIX: $NLS_LANG="simplified chinese"_china.zhs16gbk and export the variable.

Windows: edit the registry under HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE-HOME or set the environment variable set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK.

Key Takeaways

Match the charset component of NLS_LANG between source and target databases for seamless data migration.

Use ALTER DATABASE CHARACTER SET only when the new charset is a superset of the current one.

Always verify server, client, and dump file charsets before export/import operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Database Oracle Export Import NLS_LANG

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.