Fundamentals 10 min read

How JavaScript Parsers Turn Code into ASTs: Lexical & Syntax Basics

This article explains how JavaScript parsers transform source code strings into abstract syntax trees through lexical analysis and syntax parsing, covering language types, V8’s execution flow, token generation, AST construction, and practical examples, while also linking to tools like AST Explorer for further exploration.

Goodme Frontend Team

Sep 9, 2024

How JavaScript Parsers Turn Code into ASTs: Lexical & Syntax Basics

Explanation of Parsers

Hope to use this article to clearly explain what a parser does.

Preface

Have you ever wondered how the JavaScript code we write, which is just a string, gets executed by the machine?

Concept

For the machine, the JavaScript code we write is just a series of characters; the machine does not recognize them initially.

Runtime Environment

JavaScript runs in browser and Node environments, both of which embed a JavaScript engine. The most common is Google’s open‑source V8 engine. Other engines include Mozilla’s SpiderMonkey for Firefox and JavaScriptCore for Safari.

JavaScript Definition

JavaScript (JS) is a lightweight, function‑first, interpreted or just‑in‑time compiled programming language.

Language Types

From the definition we see references to interpreted, just‑in‑time compiled languages, and there is also the compiled language type.

What are interpreted, compiled, and just‑in‑time compiled languages?

Compiled Language

Before a program runs, it must be compiled by a compiler into a binary file that the machine can read. The binary can be executed directly without recompilation each time. Examples include C/C++ and Go.

Interpreted Language

Programs written in interpreted languages are dynamically interpreted and executed by an interpreter each time they run. Python and JavaScript are examples.

Just‑In‑Time Language

Just‑in‑time (JIT) compilation, also called dynamic translation or runtime compilation, compiles code during execution rather than before execution. JIT compilers continuously analyze running code and compile hot parts to improve performance, outweighing the compilation overhead.

JIT combines the advantages and disadvantages of ahead‑of‑time compilation and interpretation.

V8 Execution Flow

During execution V8 uses both the Ignition interpreter and the TurboFan compiler.

Summary

Regardless of language type, the first step is converting source code to an AST. The focus of this article is how to obtain the AST through lexical analysis and syntax analysis.

Parser

Lexical Analysis

Lexical analysis is the process of converting a character sequence into a token sequence. The program that performs lexical analysis is called a lexical analyzer (lexer) or scanner.

Syntax Analysis

In computer science, syntax analysis (parsing) determines the grammatical structure of an input token sequence according to a formal grammar.

Simple Lexical Parser Example

Pseudocode String

(add 2 (subtract 4 2))

Step 1: Tokenization

Tokens are generated by scanning characters one by one, handling various cases such as identifiers, numbers, parentheses, strings, and whitespace.

function tokenizer(input) {
  let current = 0;
  let tokens = [];

  while (current < input.length) {
    let char = input[current];

    if (char === '(') {
      tokens.push({ type: 'paren', value: '(' });
      current++;
      continue;
    }

    if (char === ')') {
      tokens.push({ type: 'paren', value: ')' });
      current++;
      continue;
    }

    let WHITESPACE = /\s/;
    if (WHITESPACE.test(char)) {
      current++;
      continue;
    }

    let NUMBERS = /[0-9]/;
    if (NUMBERS.test(char)) {
      let value = '';
      while (NUMBERS.test(char)) {
        value += char;
        char = input[++current];
      }
      tokens.push({ type: 'number', value });
      continue;
    }

    if (char === '"') {
      let value = '';
      char = input[++current];
      while (char !== '"') {
        value += char;
        char = input[++current];
      }
      char = input[++current];
      tokens.push({ type: 'string', value });
      continue;
    }

    let LETTERS = /[a-z]/i;
    if (LETTERS.test(char)) {
      let value = '';
      while (LETTERS.test(char)) {
        value += char;
        char = input[++current];
      }
      tokens.push({ type: 'name', value });
      continue;
    }

    throw new TypeError('I dont know what this character is: ' + char);
  }

  return tokens;
}

Resulting tokens:

[
  { type: 'paren', value: '(' },
  { type: 'name', value: 'add' },
  { type: 'number', value: '2' },
  { type: 'paren', value: '(' },
  { type: 'name', value: 'subtract' },
  { type: 'number', value: '4' },
  { type: 'number', value: '2' },
  { type: 'paren', value: ')' },
  { type: 'paren', value: ')' }
];

Step 2: Build AST

function parser(tokens) {
  let current = 0;

  function walk() {
    let token = tokens[current];

    if (token.type === 'number') {
      current++;
      return { type: 'NumberLiteral', value: token.value };
    }

    if (token.type === 'string') {
      current++;
      return { type: 'StringLiteral', value: token.value };
    }

    if (token.type === 'paren' && token.value === '(') {
      token = tokens[++current];
      let node = { type: 'CallExpression', name: token.value, params: [] };
      token = tokens[++current];
      while (token.type !== 'paren' || (token.type === 'paren' && token.value !== ')')) {
        node.params.push(walk());
        token = tokens[current];
      }
      current++;
      return node;
    }

    throw new TypeError(token.type);
  }

  let ast = { type: 'Program', body: [] };
  while (current < tokens.length) {
    ast.body.push(walk());
  }
  return ast;
}

Resulting AST:

{
  type: 'Program',
  body: [{
    type: 'CallExpression',
    name: 'add',
    params: [
      { type: 'NumberLiteral', value: '2' },
      {
        type: 'CallExpression',
        name: 'subtract',
        params: [
          { type: 'NumberLiteral', value: '4' },
          { type: 'NumberLiteral', value: '2' }
        ]
      }
    ]
  }]
};

AST nodes are predefined types that represent the language’s syntax; many more node types exist.

Summary

After generating the AST, the parser’s job is done; the AST can be processed further to build Babel plugins, ESLint rules, Webpack plugins, etc. Common JavaScript parsers include babel/parser and acorn.

Various languages have their own parsers; you can explore them with AST Explorer.

Conclusion

Returning to the opening question, this article covered the first step—parsing. Subsequent steps involve bytecode, machine code, and interpreters, which you can explore further.

Discussion: Compiled languages like C/C++ or Go are written in other languages; can a language compile itself?

References: AST Explorer, Browser Internals, the‑super‑tiny‑compiler.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AST V8 Syntax Parsing lexical analysis Parser

Written by

Goodme Frontend Team

Regularly sharing the team's insights and expertise in the frontend field

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.