Inside the TypeScript Compiler: How Scanning, Parsing, Binding, Checking, and Emitting Transform Code
This article explains the inner workings of the TypeScript compiler, detailing each stage—from scanning source code into tokens, parsing tokens into an AST, binding symbols, performing type checking, to emitting JavaScript and declaration files—while providing code examples and diagrams of the process.
Introduction
The TypeScript compiler adds static type checking to JavaScript, allowing early detection of type mismatches and providing editor assistance for large projects.
Key Components
Scanner: lexical analysis that generates a token stream.
Parser: builds an abstract syntax tree (AST) from tokens.
Binder: creates symbols and links them to AST nodes, forming the semantic model.
Checker: performs type checking on the bound AST.
Emitter: outputs compiled JavaScript files and declaration files (.d.ts).
Processing Flow
For source code, the scanner performs lexical analysis and produces a token stream.
The parser assembles the tokens into an AST.
The binder generates symbols and binds each AST node to its corresponding symbol.
The checker examines the processed AST and performs type checking.
The emitter generates JavaScript code and declaration files from the final AST.
Scanner
What is a token
In this context, a token is a lexical marker, not an authentication token. The scanner classifies each lexical unit into a token type.
For example, the line const a = 1; contains tokens for the keyword const, identifier a, numeric literal 1, and the semicolon.
The compiler enumerates all token types in type.ts using the SyntaxKind enum, which also stores AST node kinds used later by the parser.
export const enum SyntaxKind {
Unknown,
EndOfFileToken,
SingleLineCommentTrivia,
MultiLineCommentTrivia,
NewLineTrivia,
WhitespaceTrivia,
ShebangTrivia,
ConflictMarkerTrivia,
NumericLiteral,
BigIntLiteral,
StringLiteral,
JsxText,
JsxTextAllWhiteSpaces,
// ...(more)
}Character handling
Before describing the scanner’s workflow, a few functions related to character handling are introduced.
CharacterCodes
export const enum CharacterCodes {
_ = 0x5F,
$ = 0x24,
_0 = 0x30,
_1 = 0x31,
_2 = 0x32,
_3 = 0x33,
_4 = 0x34,
_5 = 0x35,
_6 = 0x36,
_7 = 0x37,
_8 = 0x38,
_9 = 0x39,
a = 0x61,
b = 0x62,
c = 0x63,
d = 0x64,
e = 0x65,
f = 0x66,
g = 0x67,
h = 0x68,
// ...(more)
}The compiler maps Unicode code points to readable names via this enum.
Character classification
Most character checks rely on CharacterCodes. Examples:
export function isWhiteSpaceLike(ch: number): boolean {
return isWhiteSpaceSingleLine(ch) || isLineBreak(ch);
} export function isLineBreak(ch: number): boolean {
return ch === CharacterCodes.lineFeed ||
ch === CharacterCodes.carriageReturn ||
ch === CharacterCodes.lineSeparator ||
ch === CharacterCodes.paragraphSeparator;
} function isDigit(ch: number): boolean {
return ch >= CharacterCodes._0 && ch <= CharacterCodes._9;
}Identifier detection
The compiler uses isUnicodeIdentifierStart and isUnicodeIdentifierPart to decide whether a character can start or continue an identifier.
export function isUnicodeIdentifierStart(code: number, languageVersion: ScriptTarget | undefined) {
return languageVersion! >= ScriptTarget.ES2015 ?
lookupInUnicodeMap(code, unicodeESNextIdentifierStart) :
languageVersion === ScriptTarget.ES5 ?
lookupInUnicodeMap(code, unicodeES5IdentifierStart) :
lookupInUnicodeMap(code, unicodeES3IdentifierStart);
}
function isUnicodeIdentifierPart(code: number, languageVersion: ScriptTarget | undefined) {
return languageVersion! >= ScriptTarget.ES2015 ?
lookupInUnicodeMap(code, unicodeESNextIdentifierPart) :
languageVersion === ScriptTarget.ES5 ?
lookupInUnicodeMap(code, unicodeES5IdentifierPart) :
lookupInUnicodeMap(code, unicodeES3IdentifierPart);
}Identifier characters are stored as ranges to save memory; a binary search determines membership.
function lookupInUnicodeMap(code: number, map: readonly number[]): boolean {
if (code < map[0]) {
return false;
}
let lo = 0;
let hi = map.length;
let mid: number;
while (lo + 1 < hi) {
mid = lo + (hi - lo) / 2;
mid -= mid % 2;
if (map[mid] <= code && code <= map[mid + 1]) {
return true;
}
if (code < map[mid]) {
hi = mid;
} else {
lo = mid + 2;
}
}
return false;
}Indexing
The compiler stores character positions as indices and builds a line‑start table for efficient line/column conversion.
export function computeLineStarts(text: string): number[] {
const result: number[] = new Array();
let pos = 0;
let lineStart = 0;
while (pos < text.length) {
const ch = text.charCodeAt(pos);
pos++;
switch (ch) {
case CharacterCodes.carriageReturn:
if (text.charCodeAt(pos) === CharacterCodes.lineFeed) {
pos++;
}
case CharacterCodes.lineFeed:
result.push(lineStart);
lineStart = pos;
break;
default:
if (ch > CharacterCodes.maxAsciiCharacter && isLineBreak(ch)) {
result.push(lineStart);
lineStart = pos;
}
break;
}
}
result.push(lineStart);
return result;
} export function computePositionOfLineAndCharacter(lineStarts: readonly number[], line: number, character: number, debugText?: string, allowEdits?: true): number {
if (line < 0 || line >= lineStarts.length) {
if (allowEdits) {
line = line < 0 ? 0 : line >= lineStarts.length ? lineStarts.length - 1 : line;
} else {
Debug.fail(`Bad line number. Line: ${line}, lineStarts.length: ${lineStarts.length}`);
}
}
const res = lineStarts[line] + character;
if (allowEdits) {
return res > lineStarts[line + 1] ? lineStarts[line + 1] : typeof debugText === "string" && res > debugText.length ? debugText.length : res;
}
return res;
}Scanner workflow
The scanner stores a single global token variable. Each call to scan() updates the variable to the next token, allowing the caller to read the current token’s information.
The core function is scan (about 400 lines):
function scan(): SyntaxKind {
startPos = pos;
while (true) {
tokenPos = pos;
if (pos >= end) {
return token = SyntaxKind.EndOfFileToken;
}
let ch = codePointAt(text, pos);
switch (ch) {
case CharacterCodes.exclamation:
if (text.charCodeAt(pos + 1) === CharacterCodes.equals) {
if (text.charCodeAt(pos + 2) === CharacterCodes.equals) {
return pos += 3, token = SyntaxKind.ExclamationEqualsEqualsToken;
}
return pos += 2, token = SyntaxKind.ExclamationEqualsToken;
}
pos++;
return token = SyntaxKind.ExclamationToken;
case CharacterCodes.doubleQuote:
case CharacterCodes.singleQuote:
// ... (omitted for brevity)
}
}
}Parser
The parser turns source code into an organized AST.
Main flow
The core function is parseSourceFileWorker:
function parseSourceFileWorker(languageVersion: ScriptTarget, setParentNodes: boolean, scriptKind: ScriptKind): SourceFile {
const isDeclarationFile = isDeclarationFileName(fileName);
if (isDeclarationFile) {
contextFlags |= NodeFlags.Ambient;
}
sourceFlags = contextFlags;
nextToken();
const statements = parseList(ParsingContext.SourceElements, parseStatement);
Debug.assert(token() === SyntaxKind.EndOfFileToken);
const endOfFileToken = addJSDocComment(parseTokenNode<EndOfFileToken>());
const sourceFile = createSourceFile(fileName, languageVersion, scriptKind, isDeclarationFile, statements, endOfFileToken, sourceFlags);
// ...(more)
return sourceFile;
}Parsing proceeds by repeatedly calling nextToken() and then parseList with parseStatement to create nodes.
Node creation
Node interfaces include position, end, kind, and many internal fields.
export interface ReadonlyTextRange {
readonly pos: number;
readonly end: number;
}
export interface Node extends ReadonlyTextRange {
readonly kind: SyntaxKind;
readonly flags: NodeFlags;
// ... many internal fields
readonly parent: Node;
}Example: creating a variable statement node.
function parseVariableStatement(pos: number, hasJSDoc: boolean, decorators: NodeArray<Decorator> | undefined, modifiers: NodeArray<Modifier> | undefined): VariableStatement {
const declarationList = parseVariableDeclarationList(false);
parseSemicolon();
const node = factory.createVariableStatement(modifiers, declarationList);
node.decorators = decorators;
return withJSDoc(finishNode(node, pos), hasJSDoc);
}Binder
The binder creates symbols (unrelated to ES6 Symbol) and links them to AST nodes.
Symbol
When a variable, function, or class is first defined, the binder creates a unique symbol and stores it in a symbol table.
function Symbol(this: Symbol, flags: SymbolFlags, name: __String) {
this.flags = flags;
this.escapedName = name;
this.declarations = undefined;
// ... other fields
}Main flow
Key functions: bindWorker, createSymbol, addDeclarationToSymbol.
function bind(node: Node | undefined): void {
if (!node) {
return;
}
setParent(node, parent);
const saveInStrictMode = inStrictMode;
bindWorker(node);
if (node.kind > SyntaxKind.LastToken) {
const saveParent = parent;
parent = node;
const containerFlags = getContainerFlags(node);
if (containerFlags === ContainerFlags.None) {
bindChildren(node);
}
// ...(more)
}
}Checker
The checker performs type checking; its codebase exceeds 40,000 lines.
How checking works
For a declaration like const b:number = 1;, the checker looks up the symbol for b, examines its type, and verifies compatibility with the initializer.
Main flow
The entry point is getDiagnostics, which eventually calls checkSourceFileWorker:
function checkSourceFileWorker(node: SourceFile) {
const links = getNodeLinks(node);
if (!(links.flags & NodeCheckFlags.TypeChecked)) {
if (skipTypeChecking(node, compilerOptions, host)) {
return;
}
checkGrammarSourceFile(node);
// ... various checks
forEach(node.statements, checkSourceElement);
checkSourceElement(node.endOfFileToken);
// ... more checks
}
}Within checkSourceElementWorker, a switch on node.kind dispatches to specific check functions such as checkVariableDeclaration, checkPropertyDeclaration, etc.
function checkSourceElementWorker(node: Node): void {
const kind = node.kind;
// ...
switch (kind) {
case SyntaxKind.TypeParameter:
return checkTypeParameter(node as TypeParameterDeclaration);
case SyntaxKind.Parameter:
return checkParameter(node as ParameterDeclaration);
case SyntaxKind.PropertyDeclaration:
return checkPropertyDeclaration(node as PropertyDeclaration);
case SyntaxKind.PropertySignature:
return checkPropertySignature(node as PropertySignature);
// ...(more)
}
}Emitter
The emitter outputs JavaScript code and declaration files from the final AST.
Main flow
The core function is emitFiles, which creates writers, iterates over source files, transforms nodes, and prints the results.
export function emitFiles(resolver: EmitResolver, host: EmitHost, targetSourceFile: SourceFile | undefined, { scriptTransformers, declarationTransformers }: EmitTransformers, emitOnlyDtsFiles?: boolean, onlyBuildInfo?: boolean, forceDtsEmit?: boolean): EmitResult {
const compilerOptions = host.getCompilerOptions();
const sourceMapDataList = (compilerOptions.sourceMap || compilerOptions.inlineSourceMap || getAreDeclarationMapsEnabled(compilerOptions)) ? [] : undefined;
const emittedFilesList = compilerOptions.listEmittedFiles ? [] : undefined;
const emitterDiagnostics = createDiagnosticCollection();
const newLine = getNewLineCharacter(compilerOptions, () => host.getNewLine());
const writer = createTextWriter(newLine);
// Emit each output file
forEachEmittedFile(host, emitSourceFileOrBundle, getSourceFilesToEmit(host, targetSourceFile, forceDtsEmit), forceDtsEmit, onlyBuildInfo, !targetSourceFile);
return {
emitSkipped,
diagnostics: emitterDiagnostics.getDiagnostics(),
emittedFiles: emittedFilesList,
sourceMaps: sourceMapDataList,
exportedModulesFromDeclarationEmit
};
}During emission, the compiler transforms the AST to JavaScript syntax, creates a printer, and writes the output files.
function emitJsFileOrBundle(sourceFileOrBundle: SourceFile | Bundle | undefined, jsFilePath: string | undefined, sourceMapFilePath: string | undefined, relativeToBuildInfo: (path: string) => string) {
const transform = transformNodes(resolver, host, factory, compilerOptions, [sourceFileOrBundle], scriptTransformers, false);
const printer = createPrinter(printerOptions, {
hasGlobalName: resolver.hasGlobalName,
onEmitNode: transform.emitNodeWithNotification,
isEmitNotificationEnabled: transform.isEmitNotificationEnabled,
substituteNode: transform.substituteNode,
});
Debug.assert(transform.transformed.length === 1, "Should only see one output from the transform");
printSourceFileOrBundle(jsFilePath, sourceMapFilePath, transform.transformed[0], printer, compilerOptions);
// ...(more)
}Conclusion
The TypeScript compiler is large and complex; this article only scratches the surface. Readers are encouraged to first understand the overall compilation pipeline before diving into detailed design.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
