Fundamentals 7 min read

How Computers Display Characters: Encoding, Input Methods, Unicode, and Fonts

This article explains how characters are turned into binary, mapped through Unicode and character sets, processed by input methods and font files, and finally rendered on screen, while also covering the challenges of rare characters and recent Unicode updates.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
How Computers Display Characters: Encoding, Input Methods, Unicode, and Fonts

All data on a computer is represented in binary (0 and 1). To show characters, the computer first converts the typed character into a Unicode code point, which is a binary value defined by a character encoding standard such as Unicode or GBK.

After obtaining the Unicode value, the system looks up the font file’s Charmap to translate the code point into a glyph index, loads the corresponding glyph image, renders it, and finally displays it on the monitor.

Three conditions are required for a character to appear: the input method must support the character, Unicode must contain a code for it, and the installed fonts must include a glyph for that code.

Because Chinese characters number in the tens of thousands, keyboards cannot have a key for each one, so input methods use encoding schemes (e.g., GBK) to map a sequence of keystrokes to a character. Most Chinese input methods today rely on GBK, which covers about 21,000 characters, leaving many rare characters (such as the "biáng" in "Biángbiáng noodles") unavailable.

Some input methods adopt larger character sets like Unicode, allowing entry of rarer characters. Unicode is continuously updated; the latest release, Unicode 13.0 (March 10 2020), added 5,930 characters, bringing the total to 143,859, and includes the "biáng" character (code points 30EDD–30EDE) in the CJK Extension G block.

Even when Unicode contains a character, it will not display unless the operating system’s font files contain a glyph for it. Commercial fonts that support extended CJK characters can render these rare symbols, but many systems still lack them, causing issues in various services (e.g., identity verification, ticketing, banking) when a name contains such characters.

Therefore, using obscure characters requires caution, as support depends on the combination of input method, Unicode version, operating system updates, and font availability.

Unicodecharacter encodingcomputer fundamentalsfontsinput methods
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.