Frontend Development 27 min read

Inside the TypeScript Compiler: How Scanning, Parsing, Binding, Checking, and Emitting Transform Code

This article explains the inner workings of the TypeScript compiler, detailing each stage—from scanning source code into tokens, parsing tokens into an AST, binding symbols, performing type checking, to emitting JavaScript and declaration files—while providing code examples and diagrams of the process.

ELab Team

Jan 23, 2022

Inside the TypeScript Compiler: How Scanning, Parsing, Binding, Checking, and Emitting Transform Code

Introduction

The TypeScript compiler adds static type checking to JavaScript, allowing early detection of type mismatches and providing editor assistance for large projects.

Key Components

Scanner: lexical analysis that generates a token stream.

Parser: builds an abstract syntax tree (AST) from tokens.

Binder: creates symbols and links them to AST nodes, forming the semantic model.

Checker: performs type checking on the bound AST.

Emitter: outputs compiled JavaScript files and declaration files (.d.ts).

Processing Flow

For source code, the scanner performs lexical analysis and produces a token stream.

The parser assembles the tokens into an AST.

The binder generates symbols and binds each AST node to its corresponding symbol.

The checker examines the processed AST and performs type checking.

The emitter generates JavaScript code and declaration files from the final AST.

Scanner

What is a token

In this context, a token is a lexical marker, not an authentication token. The scanner classifies each lexical unit into a token type.

For example, the line const a = 1; contains tokens for the keyword const, identifier a, numeric literal 1, and the semicolon.

The compiler enumerates all token types in type.ts using the SyntaxKind enum, which also stores AST node kinds used later by the parser.

export const enum SyntaxKind {
    Unknown,
    EndOfFileToken,
    SingleLineCommentTrivia,
    MultiLineCommentTrivia,
    NewLineTrivia,
    WhitespaceTrivia,
    ShebangTrivia,
    ConflictMarkerTrivia,
    NumericLiteral,
    BigIntLiteral,
    StringLiteral,
    JsxText,
    JsxTextAllWhiteSpaces,
    // ...(more)
}

Character handling

Before describing the scanner’s workflow, a few functions related to character handling are introduced.

CharacterCodes

export const enum CharacterCodes {
    _ = 0x5F,
    $ = 0x24,
    _0 = 0x30,
    _1 = 0x31,
    _2 = 0x32,
    _3 = 0x33,
    _4 = 0x34,
    _5 = 0x35,
    _6 = 0x36,
    _7 = 0x37,
    _8 = 0x38,
    _9 = 0x39,
    a = 0x61,
    b = 0x62,
    c = 0x63,
    d = 0x64,
    e = 0x65,
    f = 0x66,
    g = 0x67,
    h = 0x68,
    // ...(more)
}

The compiler maps Unicode code points to readable names via this enum.

Character classification

Most character checks rely on CharacterCodes. Examples:

export function isWhiteSpaceLike(ch: number): boolean {
    return isWhiteSpaceSingleLine(ch) || isLineBreak(ch);
}

export function isLineBreak(ch: number): boolean {
    return ch === CharacterCodes.lineFeed ||
           ch === CharacterCodes.carriageReturn ||
           ch === CharacterCodes.lineSeparator ||
           ch === CharacterCodes.paragraphSeparator;
}

function isDigit(ch: number): boolean {
    return ch >= CharacterCodes._0 && ch <= CharacterCodes._9;
}

Identifier detection

The compiler uses isUnicodeIdentifierStart and isUnicodeIdentifierPart to decide whether a character can start or continue an identifier.

export function isUnicodeIdentifierStart(code: number, languageVersion: ScriptTarget | undefined) {
    return languageVersion! >= ScriptTarget.ES2015 ?
        lookupInUnicodeMap(code, unicodeESNextIdentifierStart) :
        languageVersion === ScriptTarget.ES5 ?
            lookupInUnicodeMap(code, unicodeES5IdentifierStart) :
            lookupInUnicodeMap(code, unicodeES3IdentifierStart);
}
function isUnicodeIdentifierPart(code: number, languageVersion: ScriptTarget | undefined) {
    return languageVersion! >= ScriptTarget.ES2015 ?
        lookupInUnicodeMap(code, unicodeESNextIdentifierPart) :
        languageVersion === ScriptTarget.ES5 ?
            lookupInUnicodeMap(code, unicodeES5IdentifierPart) :
            lookupInUnicodeMap(code, unicodeES3IdentifierPart);
}

Identifier characters are stored as ranges to save memory; a binary search determines membership.

function lookupInUnicodeMap(code: number, map: readonly number[]): boolean {
    if (code < map[0]) {
        return false;
    }
    let lo = 0;
    let hi = map.length;
    let mid: number;
    while (lo + 1 < hi) {
        mid = lo + (hi - lo) / 2;
        mid -= mid % 2;
        if (map[mid] <= code && code <= map[mid + 1]) {
            return true;
        }
        if (code < map[mid]) {
            hi = mid;
        } else {
            lo = mid + 2;
        }
    }
    return false;
}

Indexing

The compiler stores character positions as indices and builds a line‑start table for efficient line/column conversion.

export function computeLineStarts(text: string): number[] {
    const result: number[] = new Array();
    let pos = 0;
    let lineStart = 0;
    while (pos < text.length) {
        const ch = text.charCodeAt(pos);
        pos++;
        switch (ch) {
            case CharacterCodes.carriageReturn:
                if (text.charCodeAt(pos) === CharacterCodes.lineFeed) {
                    pos++;
                }
            case CharacterCodes.lineFeed:
                result.push(lineStart);
                lineStart = pos;
                break;
            default:
                if (ch > CharacterCodes.maxAsciiCharacter && isLineBreak(ch)) {
                    result.push(lineStart);
                    lineStart = pos;
                }
                break;
        }
    }
    result.push(lineStart);
    return result;
}

export function computePositionOfLineAndCharacter(lineStarts: readonly number[], line: number, character: number, debugText?: string, allowEdits?: true): number {
    if (line < 0 || line >= lineStarts.length) {
        if (allowEdits) {
            line = line < 0 ? 0 : line >= lineStarts.length ? lineStarts.length - 1 : line;
        } else {
            Debug.fail(`Bad line number. Line: ${line}, lineStarts.length: ${lineStarts.length}`);
        }
    }
    const res = lineStarts[line] + character;
    if (allowEdits) {
        return res > lineStarts[line + 1] ? lineStarts[line + 1] : typeof debugText === "string" && res > debugText.length ? debugText.length : res;
    }
    return res;
}

Scanner workflow

The scanner stores a single global token variable. Each call to scan() updates the variable to the next token, allowing the caller to read the current token’s information.

The core function is scan (about 400 lines):

function scan(): SyntaxKind {
    startPos = pos;
    while (true) {
        tokenPos = pos;
        if (pos >= end) {
            return token = SyntaxKind.EndOfFileToken;
        }
        let ch = codePointAt(text, pos);
        switch (ch) {
            case CharacterCodes.exclamation:
                if (text.charCodeAt(pos + 1) === CharacterCodes.equals) {
                    if (text.charCodeAt(pos + 2) === CharacterCodes.equals) {
                        return pos += 3, token = SyntaxKind.ExclamationEqualsEqualsToken;
                    }
                    return pos += 2, token = SyntaxKind.ExclamationEqualsToken;
                }
                pos++;
                return token = SyntaxKind.ExclamationToken;
            case CharacterCodes.doubleQuote:
            case CharacterCodes.singleQuote:
                // ... (omitted for brevity)
        }
    }
}

Parser

The parser turns source code into an organized AST.

Main flow

The core function is parseSourceFileWorker:

function parseSourceFileWorker(languageVersion: ScriptTarget, setParentNodes: boolean, scriptKind: ScriptKind): SourceFile {
    const isDeclarationFile = isDeclarationFileName(fileName);
    if (isDeclarationFile) {
        contextFlags |= NodeFlags.Ambient;
    }
    sourceFlags = contextFlags;
    nextToken();
    const statements = parseList(ParsingContext.SourceElements, parseStatement);
    Debug.assert(token() === SyntaxKind.EndOfFileToken);
    const endOfFileToken = addJSDocComment(parseTokenNode<EndOfFileToken>());
    const sourceFile = createSourceFile(fileName, languageVersion, scriptKind, isDeclarationFile, statements, endOfFileToken, sourceFlags);
    // ...(more)
    return sourceFile;
}

Parsing proceeds by repeatedly calling nextToken() and then parseList with parseStatement to create nodes.

Node creation

Node interfaces include position, end, kind, and many internal fields.

export interface ReadonlyTextRange {
    readonly pos: number;
    readonly end: number;
}
export interface Node extends ReadonlyTextRange {
    readonly kind: SyntaxKind;
    readonly flags: NodeFlags;
    // ... many internal fields
    readonly parent: Node;
}

Example: creating a variable statement node.

function parseVariableStatement(pos: number, hasJSDoc: boolean, decorators: NodeArray<Decorator> | undefined, modifiers: NodeArray<Modifier> | undefined): VariableStatement {
    const declarationList = parseVariableDeclarationList(false);
    parseSemicolon();
    const node = factory.createVariableStatement(modifiers, declarationList);
    node.decorators = decorators;
    return withJSDoc(finishNode(node, pos), hasJSDoc);
}

Binder

The binder creates symbols (unrelated to ES6 Symbol) and links them to AST nodes.

Symbol

When a variable, function, or class is first defined, the binder creates a unique symbol and stores it in a symbol table.

function Symbol(this: Symbol, flags: SymbolFlags, name: __String) {
    this.flags = flags;
    this.escapedName = name;
    this.declarations = undefined;
    // ... other fields
}

Main flow

Key functions: bindWorker, createSymbol, addDeclarationToSymbol.

function bind(node: Node | undefined): void {
    if (!node) {
        return;
    }
    setParent(node, parent);
    const saveInStrictMode = inStrictMode;
    bindWorker(node);
    if (node.kind > SyntaxKind.LastToken) {
        const saveParent = parent;
        parent = node;
        const containerFlags = getContainerFlags(node);
        if (containerFlags === ContainerFlags.None) {
            bindChildren(node);
        }
        // ...(more)
    }
}

Checker

The checker performs type checking; its codebase exceeds 40,000 lines.

How checking works

For a declaration like const b:number = 1;, the checker looks up the symbol for b, examines its type, and verifies compatibility with the initializer.

Main flow

The entry point is getDiagnostics, which eventually calls checkSourceFileWorker:

function checkSourceFileWorker(node: SourceFile) {
    const links = getNodeLinks(node);
    if (!(links.flags & NodeCheckFlags.TypeChecked)) {
        if (skipTypeChecking(node, compilerOptions, host)) {
            return;
        }
        checkGrammarSourceFile(node);
        // ... various checks
        forEach(node.statements, checkSourceElement);
        checkSourceElement(node.endOfFileToken);
        // ... more checks
    }
}

Within checkSourceElementWorker, a switch on node.kind dispatches to specific check functions such as checkVariableDeclaration, checkPropertyDeclaration, etc.

function checkSourceElementWorker(node: Node): void {
    const kind = node.kind;
    // ...
    switch (kind) {
        case SyntaxKind.TypeParameter:
            return checkTypeParameter(node as TypeParameterDeclaration);
        case SyntaxKind.Parameter:
            return checkParameter(node as ParameterDeclaration);
        case SyntaxKind.PropertyDeclaration:
            return checkPropertyDeclaration(node as PropertyDeclaration);
        case SyntaxKind.PropertySignature:
            return checkPropertySignature(node as PropertySignature);
        // ...(more)
    }
}

Emitter

The emitter outputs JavaScript code and declaration files from the final AST.

Main flow

The core function is emitFiles, which creates writers, iterates over source files, transforms nodes, and prints the results.

export function emitFiles(resolver: EmitResolver, host: EmitHost, targetSourceFile: SourceFile | undefined, { scriptTransformers, declarationTransformers }: EmitTransformers, emitOnlyDtsFiles?: boolean, onlyBuildInfo?: boolean, forceDtsEmit?: boolean): EmitResult {
    const compilerOptions = host.getCompilerOptions();
    const sourceMapDataList = (compilerOptions.sourceMap || compilerOptions.inlineSourceMap || getAreDeclarationMapsEnabled(compilerOptions)) ? [] : undefined;
    const emittedFilesList = compilerOptions.listEmittedFiles ? [] : undefined;
    const emitterDiagnostics = createDiagnosticCollection();
    const newLine = getNewLineCharacter(compilerOptions, () => host.getNewLine());
    const writer = createTextWriter(newLine);
    // Emit each output file
    forEachEmittedFile(host, emitSourceFileOrBundle, getSourceFilesToEmit(host, targetSourceFile, forceDtsEmit), forceDtsEmit, onlyBuildInfo, !targetSourceFile);
    return {
        emitSkipped,
        diagnostics: emitterDiagnostics.getDiagnostics(),
        emittedFiles: emittedFilesList,
        sourceMaps: sourceMapDataList,
        exportedModulesFromDeclarationEmit
    };
}

During emission, the compiler transforms the AST to JavaScript syntax, creates a printer, and writes the output files.

function emitJsFileOrBundle(sourceFileOrBundle: SourceFile | Bundle | undefined, jsFilePath: string | undefined, sourceMapFilePath: string | undefined, relativeToBuildInfo: (path: string) => string) {
    const transform = transformNodes(resolver, host, factory, compilerOptions, [sourceFileOrBundle], scriptTransformers, false);
    const printer = createPrinter(printerOptions, {
        hasGlobalName: resolver.hasGlobalName,
        onEmitNode: transform.emitNodeWithNotification,
        isEmitNotificationEnabled: transform.isEmitNotificationEnabled,
        substituteNode: transform.substituteNode,
    });
    Debug.assert(transform.transformed.length === 1, "Should only see one output from the transform");
    printSourceFileOrBundle(jsFilePath, sourceMapFilePath, transform.transformed[0], printer, compilerOptions);
    // ...(more)
}

Conclusion

The TypeScript compiler is large and complex; this article only scratches the surface. Readers are encouraged to first understand the overall compilation pipeline before diving into detailed design.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

TypeScript AST compiler type checking Binder Parser scanner emitter

Written by

ELab Team

Sharing fresh technical insights

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.