Frontend Development 10 min read

Implementing Length-Limited Input with Unicode and Emoji Support in JavaScript

The article explains how to enforce a custom length limit in JavaScript input fields that counts English letters and numbers as half a unit, Chinese characters as one unit, and each emoji as one unit, using Unicode code‑point detection, regex extraction, and automatic truncation to prevent overflow.

Xianyu Technology
Xianyu Technology
Xianyu Technology
Implementing Length-Limited Input with Unicode and Emoji Support in JavaScript

Background

User input is a key way to collect information and express oneself in community scenarios. The "Circle Goods" feature requires users to create custom product groups, demanding smooth, accurate, and rich input handling.

Problem Statement

The product and design teams have two core requirements:

Support Chinese characters, English letters, numbers, and emoji with custom length rules: English/number counts as 0.5, a Chinese character counts as 1, an emoji counts as 1.

Automatically truncate input when the length limit is exceeded, preventing further entry.

Conventional Solution

The first idea is to use the native maxlength attribute. While it limits the number of characters, it cannot differentiate between Chinese, English, and emoji because it measures the encoded string length.

Unicode and JavaScript String Encoding

Unicode assigns a six‑digit hex code point (U+xxxxxx) to each character. Most common characters reside in Plane 0 (Basic Multilingual Plane). JavaScript strings are encoded in UTF‑16, where characters in Plane 0 occupy a single 16‑bit unit, making maxlength treat Chinese and English equally.

Emoji Encoding

Emoji live in Plane 1 and are represented by surrogate pairs in UTF‑16 (e.g., 😀 is U+1F600 → 0xD83D 0xDE00). charCodeAt() returns the 16‑bit unit, not the full code point, but codePointAt() can retrieve the correct code point.

Length‑Calculation Approach

The solution consists of three steps:

Extract and count emoji separately using a regex; each emoji contributes a length of 1.

Iterate over the remaining characters, checking their code‑point ranges (e.g., U+4E00‑U+9FFF for CJK Unified Ideographs) to assign the appropriate length (Chinese = 1, English/number = 0.5).

Sum the lengths to obtain the total input length.

Automatic Truncation

When the total length exceeds the limit, the input is truncated based on the calculated length. Two interaction cases are considered:

If the current length is below the limit, input proceeds normally.

If a paste or bulk entry exceeds the limit, only the first n characters that fit within the limit are kept.

Several implementation attempts were explored:

InputType‑Based Handling : Using oninput and the inputEvent object's inputType to differentiate actions. However, the variety of inputType values and browser inconsistencies make this approach costly.

Text Diff : Compare the new value with the previous value on each oninput event, classifying changes as delete, insert, or replace, then apply truncation accordingly.

Special Cases : For iOS Pinyin composition, ignore insertCompositionText events and perform truncation on compositionend . For undo/redo, block the action in beforeinput using preventDefault() .

Demo & Code Snippet

The final component demonstrates correct length calculation, emoji handling, and automatic truncation across browsers and platforms.

Conclusion

The project deepened understanding of JavaScript string handling, Unicode planes, and emoji encoding. The implemented input solution resolves a common front‑end challenge and can serve as a reference for similar requirements.

Reference

Unicode Technical Report 11, Wikipedia on Unicode planes, Kevin Scott’s article on emojis in JavaScript, and the CJK Unified Ideographs block.

emojiFront-endJavaScriptUnicodeinput validationLength Calculation
Xianyu Technology
Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.