Understanding Oracle Character Sets, NLS_LANG Settings, and Import/Export Character Conversion
This article explains the fundamentals of Oracle character sets, how database and national character sets are defined, the role of NLS_LANG, the location and purpose of character set files, and practical guidance for import/export operations and character set conversion across different environments.
Oracle Globalization Support enables the storage and retrieval of data in local languages and formats, making it essential to understand character sets, database character set configuration, character set files, NLS_LANG settings, and import/export character conversion.
3.1 Basic Knowledge of Character Sets – The earliest encoding scheme originated from ASCII in the 1960s, evolving into ISO‑646 and later Unicode. Unicode provides a universal code point for every character, with implementation standards such as UTF‑8, UCS‑2, and UTF‑16. Oracle has supported UTF‑8 since version 7.2.
3.2 Database Character Sets – When creating a database you must specify a CHARACTER SET and a NATIONAL CHARACTER SET . The database character set is used for CHAR, VARCHAR2, CLOB, LONG, table/column names, and PL/SQL code, while the national character set stores NCHAR, NVARCHAR2, NCLOB, etc. Example creation script:
CREATE DATABASE "eygle"
MAXINSTANCES 8 MAXLOGHISTORY 1 MAXLOGFILES 16 MAXLOGMEMBERS 3 MAXDATAFILES 100
DATAFILE SIZE 300M AUTOEXTEND ON NEXT 10240K MAXSIZE UNLIMITED EXTENT MANAGEMENT LOCAL
SYSAUX DATAFILE SIZE 120M AUTOEXTEND ON NEXT 10240K MAXSIZE UNLIMITED
SMALLFILE DEFAULT TEMPORARY TABLESPACE TEMP TEMPFILE SIZE 20M AUTOEXTEND ON NEXT 640K MAXSIZE UNLIMITED
SMALLFILE UNDO TABLESPACE "UNDOTBS1" DATAFILE SIZE 200M AUTOEXTEND ON NEXT 5120K MAXSIZE UNLIMITED
CHARACTER SET ZHS16GBK
NATIONAL CHARACTER SET AL16UTF16
…For Simplified Chinese platforms the default character set is often ZHS16GBK . Selecting the appropriate character set at creation time is crucial because it determines which characters can be stored.
Common Chinese character sets include:
ZHS16CGB231280 CGB2312‑80 16‑bit Simplified Chinese MB, ASCII
ZHS16GBK GBK 16‑bit Simplified Chinese MB, ASCII, UDCOracle naming conventions for character sets follow the pattern <Language><BitSize><Encoding> , e.g., ZHS16GBK . Some older sets, such as UTF‑8 in Oracle 8/8i, break this rule.
3.3 Character Set Files and Support – Oracle uses the NLS Runtime Library (NLSRTL) to handle language‑specific conversions. Character set files are stored under directories controlled by environment variables like ORA_NLS , ORA_NLS10 , etc. For example, on Oracle 9i the default location is $ORACLE_HOME/ocommon/nls/admin/data , and on 10g/11g it is $ORACLE_HOME/nls/data . These directories contain files defining language ( lx00023.nlb for Chinese), territory, and character set information.
Using the Locale Builder tool you can inspect these files to see locale‑specific settings such as month and weekday names.
3.4 NLS_LANG Settings and Impact – The NLS_LANG environment variable has the form <Language>_<Territory>.<ClientCharacterSet> . It controls the language of Oracle messages, date formatting, numeric formatting, and the client character set used to decode data received from the database.
Examples:
set NLS_LANG=SIMPLIFIED CHINESE_CHINA.ZHS16GBK
sqlplus "/ as sysdba"
SELECT SYSDATE FROM DUAL; -- returns "01-11月-03"
set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK
sqlplus "/ as sysdba"
SELECT SYSDATE FROM DUAL; -- returns "01-NOV-03"On Windows the NLS_LANG value is stored in the registry under HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\ . On Unix/Linux it can be queried with env | grep NLS .
3.5 Import/Export and Character Conversion – Traditional EXP/IMP tools rely on the client NLS_LANG setting. The export file records the character set used during export; during import Oracle compares this with the session NLS_LANG and performs conversion if necessary. Best practice is to match the client character set to the database character set during export to avoid unnecessary conversions.
When source and target databases have different character sets, the recommended workflow is:
Export with NLS_LANG matching the source database.
Import with NLS_LANG set to the source character set, letting Oracle convert data to the target character set (which should be a superset of the source).
Example of a failed conversion:
set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK
imp eygle/eygle file=Sus7ascii-Cus7ascii-exp817.dmp …
IMP-00016: required character set conversion (type 1 to 852) not supportedFrom Oracle 10g onward, Data Pump (EXPDP/IMPDP) performs export/import on the server side, eliminating most client‑side character set issues.
In summary, proper selection of database character sets, correct configuration of NLS_LANG , and awareness of how import/export tools handle character conversion are essential for reliable multilingual Oracle deployments.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.