Fundamentals 11 min read

Why Do escape and encodeURI Encode URLs Differently? Explore Percent-Encoding

This article explains the differences between JavaScript’s escape, encodeURI, and encodeURIComponent functions, detailing their encoding rules, percent‑encoding standards, reserved and unreserved characters, and how Unicode characters are transformed into UTF‑8 byte sequences, while also covering ASCII, Unicode, and UTF‑8 fundamentals.

Aotu Lab

Jun 30, 2017

Why Do escape and encodeURI Encode URLs Differently? Explore Percent-Encoding

1. From escape and encodeURI

Assuming you know how escape works:

It does not encode ASCII letters or digits.

It does not encode the characters *@-_+./.

All other characters are replaced by escape sequences.

escape('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789')
// "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"

escape('*@-_+./')
// "*@-_+./"

Assuming you know how encodeURI works:

It does not encode ASCII letters or digits.

It does not encode the 20 ASCII punctuation characters -_.!~*'();/?:@&=+$,#.

All other characters are replaced by escape sequences.

encodeURI('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789')
// "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"

encodeURI("-_.!~*'();/?:@&=+$,#")
// "-_.!~*'();/?:@&=+$,#"

Thus the non‑encoding set of escape (69 characters) is a subset of the non‑encoding set of encodeURI (82 characters).

2. Percent‑encoding

Both escape and encodeURI produce percent‑encoded strings, but they differ in standards.

Percent‑encoding (also called URL encoding) is the mechanism for encoding a URI in a specific context.

encodeURI

follows the W3C standard (RFC 3986), while escape is non‑standard.

Common point: Characters that need encoding are represented as two hexadecimal digits prefixed by %.

Differences:

Standard ( encodeURI): Non‑ASCII characters are first converted to UTF‑8 bytes, then each byte is percent‑encoded.

Non‑standard ( escape): Non‑ASCII characters are represented as %uxxxx, where xxxx is the 4‑digit hexadecimal Unicode code point.

Example with the Chinese characters “凹凸”:

escape('凹凸')
// "%u51F9%u51F8"

encodeURI('凹凸')
// "%E5%87%B9%E5%87%B8"

3. Reserved, unreserved and unsafe characters

RFC 3986 defines which characters may appear in a URI.

Unreserved characters (no percent‑encoding needed): letters (A‑Z, a‑z), digits (0‑9), and -_.~.

Reserved characters have special meanings, e.g. :/?#[]@ for delimiters and !$&'()*+,;= for component separation.

Unsafe (restricted) characters should be percent‑encoded because they can cause ambiguity or are non‑printable, such as %, space, <>", braces, backslashes, control characters (0x00‑0x1F, 0x7F), and any character > 0x7F.

Examples: encodeURI('%') →

%25

encodeURI(' ')

→

%20

encodeURI('<>"')

→

%3C%3E%22

encodeURI('京东')

→

%E4%BA%AC%E4%B8%9C

4. encodeURI vs encodeURIComponent

encodeURIComponent

assumes the argument is a URI component (e.g., query string) and therefore encodes the delimiter characters ;/?:@&=+$,#. Its non‑encoding set contains only 71 characters.

encodeURIComponent('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789,/?:@&=+$#')
// "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789%2C%2F%3F%3A%40%26%3D%2B%24%23"

Comparison example:

encodeURIComponent('https://aotu.io/')
// "https%3A%2F%2Faotu.io%2F"

encodeURI('https://aotu.io/')
// "https://aotu.io/"

5. Character encoding basics

Different character sets contain different numbers of Chinese characters:

GB 2312 – 6,763 characters

GBK – 20,912 characters

GB 18030 – 70,244 characters

Big5 – 13,053 characters

Unicode CJK Unified Ideographs – 20,902 characters (plus extensions, totaling over 70,000)

Relationship between ASCII, Unicode and UTF‑8:

ASCII : 7‑bit encoding for 128 characters (0‑127).

Unicode : A universal code point system covering all scripts.

UTF‑8 : A variable‑length encoding of Unicode code points into 1‑4 bytes.

UTF‑8 encoding rules:

U+0000 – U+007F → 0xxxxxxx (1 byte)

U+0080 – U+07FF → 110xxxxx 10xxxxxx (2 bytes)

U+0800 – U+FFFF → 1110xxxx 10xxxxxx 10xxxxxx (3 bytes)

U+10000 – U+10FFFF → 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (4 bytes)

Example: encoding the character “凹” (Unicode \u51F9) which falls in the 3‑byte range.

encodeURI('凹')
// "%E5%87%B9"

Interpreting the UTF‑8 bytes 11100101 10000111 10111001:

The first byte does not start with 0, indicating it is not a single‑byte character.

The leading bits 1110 show that the character occupies three bytes.

The three bytes together represent one Unicode symbol, not three separate symbols.

6. References

http://www.w3school.com.cn/jsref/jsref_escape.asp

http://www.w3school.com.cn/jsref/jsref_encodeURI.asp

http://www.w3school.com.cn/jsref/jsref_encodeURIComponent.asp

https://zh.wikipedia.org/wiki/%E7%99%BE%E5%88%86%E5%8F%B7%E7%BC%96%E7%A0%81

https://www.zhihu.com/question/21861899

http://www.ituring.com.cn/book/miniarticle/44590

https://kb.cnblogs.com/page/133765/

http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

JavaScript Unicode UTF-8 percent-encoding URL encoding

Written by

Aotu Lab

Aotu Lab, founded in October 2015, is a front-end engineering team serving multi-platform products. The articles in this public account are intended to share and discuss technology, reflecting only the personal views of Aotu Lab members and not the official stance of JD.com Technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.