Implementing Length-Limited Input with Unicode and Emoji Support in JavaScript
The article explains how to enforce a custom length limit in JavaScript input fields that counts English letters and numbers as half a unit, Chinese characters as one unit, and each emoji as one unit, using Unicode code‑point detection, regex extraction, and automatic truncation to prevent overflow.
Background
User input is a key way to collect information and express oneself in community scenarios. The "Circle Goods" feature requires users to create custom product groups, demanding smooth, accurate, and rich input handling.
Problem Statement
The product and design teams have two core requirements:
Support Chinese characters, English letters, numbers, and emoji with custom length rules: English/number counts as 0.5, a Chinese character counts as 1, an emoji counts as 1.
Automatically truncate input when the length limit is exceeded, preventing further entry.
Conventional Solution
The first idea is to use the native maxlength attribute. While it limits the number of characters, it cannot differentiate between Chinese, English, and emoji because it measures the encoded string length.
Unicode and JavaScript String Encoding
Unicode assigns a six‑digit hex code point (U+xxxxxx) to each character. Most common characters reside in Plane 0 (Basic Multilingual Plane). JavaScript strings are encoded in UTF‑16, where characters in Plane 0 occupy a single 16‑bit unit, making maxlength treat Chinese and English equally.
Emoji Encoding
Emoji live in Plane 1 and are represented by surrogate pairs in UTF‑16 (e.g., 😀 is U+1F600 → 0xD83D 0xDE00). charCodeAt() returns the 16‑bit unit, not the full code point, but codePointAt() can retrieve the correct code point.
Length‑Calculation Approach
The solution consists of three steps:
Extract and count emoji separately using a regex; each emoji contributes a length of 1.
Iterate over the remaining characters, checking their code‑point ranges (e.g., U+4E00‑U+9FFF for CJK Unified Ideographs) to assign the appropriate length (Chinese = 1, English/number = 0.5).
Sum the lengths to obtain the total input length.
Automatic Truncation
When the total length exceeds the limit, the input is truncated based on the calculated length. Two interaction cases are considered:
If the current length is below the limit, input proceeds normally.
If a paste or bulk entry exceeds the limit, only the first n characters that fit within the limit are kept.
Several implementation attempts were explored:
InputType‑Based Handling : Using oninput and the inputEvent object's inputType to differentiate actions. However, the variety of inputType values and browser inconsistencies make this approach costly.
Text Diff : Compare the new value with the previous value on each oninput event, classifying changes as delete, insert, or replace, then apply truncation accordingly.
Special Cases : For iOS Pinyin composition, ignore insertCompositionText events and perform truncation on compositionend . For undo/redo, block the action in beforeinput using preventDefault() .
Demo & Code Snippet
The final component demonstrates correct length calculation, emoji handling, and automatic truncation across browsers and platforms.
Conclusion
The project deepened understanding of JavaScript string handling, Unicode planes, and emoji encoding. The implemented input solution resolves a common front‑end challenge and can serve as a reference for similar requirements.
Reference
Unicode Technical Report 11, Wikipedia on Unicode planes, Kevin Scott’s article on emojis in JavaScript, and the CJK Unified Ideographs block.
Xianyu Technology
Official account of the Xianyu technology team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.