Why Do escape and encodeURI Encode URLs Differently? Explore Percent-Encoding
This article explains the differences between JavaScript’s escape, encodeURI, and encodeURIComponent functions, detailing their encoding rules, percent‑encoding standards, reserved and unreserved characters, and how Unicode characters are transformed into UTF‑8 byte sequences, while also covering ASCII, Unicode, and UTF‑8 fundamentals.
1. From escape and encodeURI
Assuming you know how escape works:
It does not encode ASCII letters or digits.
It does not encode the characters *@-_+./.
All other characters are replaced by escape sequences.
escape('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789')
// "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
escape('*@-_+./')
// "*@-_+./"Assuming you know how encodeURI works:
It does not encode ASCII letters or digits.
It does not encode the 20 ASCII punctuation characters -_.!~*'();/?:@&=+$,#.
All other characters are replaced by escape sequences.
encodeURI('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789')
// "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
encodeURI("-_.!~*'();/?:@&=+$,#")
// "-_.!~*'();/?:@&=+$,#"Thus the non‑encoding set of escape (69 characters) is a subset of the non‑encoding set of encodeURI (82 characters).
2. Percent‑encoding
Both escape and encodeURI produce percent‑encoded strings, but they differ in standards.
Percent‑encoding (also called URL encoding) is the mechanism for encoding a URI in a specific context.
encodeURIfollows the W3C standard (RFC 3986), while escape is non‑standard.
Common point: Characters that need encoding are represented as two hexadecimal digits prefixed by %.
Differences:
Standard ( encodeURI): Non‑ASCII characters are first converted to UTF‑8 bytes, then each byte is percent‑encoded.
Non‑standard ( escape): Non‑ASCII characters are represented as %uxxxx, where xxxx is the 4‑digit hexadecimal Unicode code point.
Example with the Chinese characters “凹凸”:
escape('凹凸')
// "%u51F9%u51F8"
encodeURI('凹凸')
// "%E5%87%B9%E5%87%B8"3. Reserved, unreserved and unsafe characters
RFC 3986 defines which characters may appear in a URI.
Unreserved characters (no percent‑encoding needed): letters (A‑Z, a‑z), digits (0‑9), and -_.~.
Reserved characters have special meanings, e.g. :/?#[]@ for delimiters and !$&'()*+,;= for component separation.
Unsafe (restricted) characters should be percent‑encoded because they can cause ambiguity or are non‑printable, such as %, space, <>", braces, backslashes, control characters (0x00‑0x1F, 0x7F), and any character > 0x7F.
Examples: encodeURI('%') →
%25 encodeURI(' ')→
%20 encodeURI('<>"')→
%3C%3E%22 encodeURI('京东')→
%E4%BA%AC%E4%B8%9C4. encodeURI vs encodeURIComponent
encodeURIComponentassumes the argument is a URI component (e.g., query string) and therefore encodes the delimiter characters ;/?:@&=+$,#. Its non‑encoding set contains only 71 characters.
encodeURIComponent('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789,/?:@&=+$#')
// "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789%2C%2F%3F%3A%40%26%3D%2B%24%23"Comparison example:
encodeURIComponent('https://aotu.io/')
// "https%3A%2F%2Faotu.io%2F"
encodeURI('https://aotu.io/')
// "https://aotu.io/"5. Character encoding basics
Different character sets contain different numbers of Chinese characters:
GB 2312 – 6,763 characters
GBK – 20,912 characters
GB 18030 – 70,244 characters
Big5 – 13,053 characters
Unicode CJK Unified Ideographs – 20,902 characters (plus extensions, totaling over 70,000)
Relationship between ASCII, Unicode and UTF‑8:
ASCII : 7‑bit encoding for 128 characters (0‑127).
Unicode : A universal code point system covering all scripts.
UTF‑8 : A variable‑length encoding of Unicode code points into 1‑4 bytes.
UTF‑8 encoding rules:
U+0000 – U+007F → 0xxxxxxx (1 byte)
U+0080 – U+07FF → 110xxxxx 10xxxxxx (2 bytes)
U+0800 – U+FFFF → 1110xxxx 10xxxxxx 10xxxxxx (3 bytes)
U+10000 – U+10FFFF → 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (4 bytes)
Example: encoding the character “凹” (Unicode \u51F9) which falls in the 3‑byte range.
encodeURI('凹')
// "%E5%87%B9"Interpreting the UTF‑8 bytes 11100101 10000111 10111001:
The first byte does not start with 0, indicating it is not a single‑byte character.
The leading bits 1110 show that the character occupies three bytes.
The three bytes together represent one Unicode symbol, not three separate symbols.
6. References
http://www.w3school.com.cn/jsref/jsref_escape.asp
http://www.w3school.com.cn/jsref/jsref_encodeURI.asp
http://www.w3school.com.cn/jsref/jsref_encodeURIComponent.asp
https://zh.wikipedia.org/wiki/%E7%99%BE%E5%88%86%E5%8F%B7%E7%BC%96%E7%A0%81
https://www.zhihu.com/question/21861899
http://www.ituring.com.cn/book/miniarticle/44590
https://kb.cnblogs.com/page/133765/
http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Aotu Lab
Aotu Lab, founded in October 2015, is a front-end engineering team serving multi-platform products. The articles in this public account are intended to share and discuss technology, reflecting only the personal views of Aotu Lab members and not the official stance of JD.com Technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
