Databases 13 min read

Why the Same Chinese Text Becomes Garbled in Different Oracle Sessions – A Character Set Deep Dive

The article walks through a hands‑on experiment that shows how inserting the same Chinese string into an Oracle 10g database from clients using different character sets (ZHS16GBK vs AL32UTF8) leads to different stored encodings and results in garbled output, then explains the underlying conversion process and how to avoid it.

ITPUB
ITPUB
ITPUB
Why the Same Chinese Text Becomes Garbled in Different Oracle Sessions – A Character Set Deep Dive

Background

Oracle databases can display garbled characters when the client character set differs from the database character set. This article uses two concrete sessions to illustrate how the same Chinese string "中国" is stored and retrieved differently.

Environment and Setup

Database: Oracle 10.2.0.3 with character set AL32UTF8.

Client machine: Windows XP.

Two SQL*Plus sessions are created:

Session 1 – client character set ZHS16GBK (set via the NLS_LANG registry entry).

Session 2 – client character set AL32UTF8, matching the database.

Table Creation and Data Insertion

SQL> drop table test purge;
SQL> create table test(col1 number(1), col2 varchar2(10));

Session 1 inserts the Chinese string: SQL> insert into test values(1,'中国'); Session 2 inserts the same string:

SQL> insert into test values(2,'中国');

Query Results

When each session queries the table, the displayed values differ:

-- Session 1 result
COL1 COL2
2 ???
1 中国
-- Session 2 result
COL1 COL2
2 中国
1 涓浗

Inspecting Stored Bytes with DUMP

SQL> SELECT col1, DUMP(col2,1016) FROM test;
COL1 DUMP(COL2,1016)
--- ------------------------------------------------------------
2   Typ=1 Len=4 CharacterSet=AL32UTF8: d6,d0,b9,fa
1   Typ=1 Len=6 CharacterSet=AL32UTF8: e4,b8,ad,e5,9b,bd

The two rows contain different byte sequences for the same logical string.

Encoding Analysis

In a ZHS16GBK client, the string "中国" is encoded as d6,d0,b9,fa. Oracle receives these bytes and, because the database character set is AL32UTF8, converts them to the UTF‑8 representation e4,b8,ad,e5,9b,bd before storing.

When the client already uses AL32UTF8, Oracle stores the incoming bytes unchanged. If the underlying Windows code page is still GBK, the bytes d6,d0,b9,fa are stored directly, resulting in a mismatched encoding.

Why Garbled Output Appears

During retrieval Oracle converts stored UTF‑8 bytes back to the client’s character set. If the client’s OS display charset cannot represent the resulting byte sequence (e.g., UTF‑8 bytes interpreted as GBK), the system substitutes "?" or shows unrelated characters such as "涓浗".

Key Misconception

Matching NLS_LANG to the database character set alone does not guarantee correct display. The operating system’s locale and code page must also support the target language; otherwise conversion cannot produce the proper visual output.

Correct Approach to Avoid Garbled Characters

Configure the OS locale and code page to support the language (e.g., install a Chinese locale on Windows, which sets code page 936).

Set NLS_LANG to the same character set as the OS.

Use a database character set appropriate for the multilingual data (AL32UTF8 is recommended).

When the OS, client, and database character sets are aligned, Oracle can convert and display characters correctly without corruption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLencodingOracleCharacter Set
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.