Understanding ECMAScript Grammars: Lexical and Syntactic Rules and the Disallowance of await as an Identifier
This article explains how the ECMAScript specification defines four context‑free grammars—lexical, syntactic, RegExp, and numeric string—illustrates ambiguities such as the '/' token and template literals, and shows how static semantics forbid using the await keyword as an identifier inside async functions while allowing it elsewhere.
ECMAScript Grammars
The ECMAScript spec defines four grammars: the lexical grammar (translating Unicode code points into input elements), the syntactic grammar (defining how tokens form valid programs), the RegExp grammar (defining regular expressions), and the numeric string grammar (converting strings to numbers). Each grammar is expressed as a context‑free grammar with productions.
Lexical Grammar
The source text is a sequence of Unicode code points; the lexical grammar tokenises this sequence. Ambiguities arise, for example, when the character / can be a division operator ( DivPunctuator ) or the start of a regular‑expression literal ( RegularExpressionLiteral ), depending on the surrounding context:
const x = 10 / 5;Here / is a division operator. In contrast:
const r = /foo/;Here / begins a regular‑expression literal. Similar context‑dependent parsing applies to template literals, where the sequence }` can be a TemplateTail or part of a TemplateHead depending on its position.
The lexical grammar uses goal symbols such as InputElementDiv and InputElementRegExp to decide which tokens are permitted. For example, InputElementDiv allows DivPunctuator but not RegularExpressionLiteral , whereas InputElementRegExp permits the opposite.
Syntactic Grammar
The syntactic grammar builds on the lexical grammar, defining how tokens combine into syntactically correct programs. As an example, introducing a new keyword like await must not break existing code that used the word as an identifier.
function old() { var await; }In async functions, await is a keyword, so the same code becomes a syntax error:
async function modern() { var await; // Syntax error }To handle this, the spec uses parameterised productions (e.g., VariableStatement[Yield, Await] ) and static semantics. The static‑semantic rule for BindingIdentifier states that a production with an [Await] parameter and the string value "await" is a syntax error, preventing await from being used as an identifier inside async functions.
Static Semantics and Identifier Names
Static semantics are applied before execution to enforce early errors. They also resolve cases where the identifier’s string value is formed via Unicode escapes, such as \u0061wait , which yields the string "await" but is not recognised as a keyword by the lexical grammar; static semantics still forbid its use in async functions.
function old() { var \u0061wait; } // allowed async function modern() { var \u0061wait; // Syntax error }Summary
The article familiarises the reader with ECMAScript’s lexical and syntactic grammars, demonstrates context‑sensitive tokenisation (e.g., the '/' and template literal cases), and explains how static semantics enforce that await cannot be used as an identifier inside async functions while remaining valid elsewhere.
ByteFE
Cutting‑edge tech, article sharing, and practical insights from the ByteDance frontend team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.