Implementing a Super Tiny Compiler: Parsing, Transforming, and Code Generation
This article walks through creating a minimal JavaScript compiler—starting from tokenizing Lisp‑style input, building an abstract syntax tree, traversing and transforming it, and finally generating equivalent C‑style code, illustrating the core parse‑transform‑generate workflow used by tools like Babel.
Introduction: The article explores the super‑tiny‑compiler repository, a <200‑line JavaScript compiler that demonstrates fundamental compilation concepts similar to Babel.
Parse phase: It explains lexical analysis turning source like (add 2 (subtract 4 2)) into tokens, then syntax analysis building an AST, showing example token array and AST structure.
const tokenizer = (input) => {
const tokens = [];
let current = 0;
while (current < input.length) {
let char = input[current];
if (char === '(') {
tokens.push({
type: 'paren',
value: '('
})
current++;
continue;
}
if (char === ')') {
tokens.push({
type: 'paren',
value: ')'
})
current ++;
continue;
}
// whitespace
if (/\s/.test(char)) {
current++;
continue;
}
// numbers
if (/[0-9]/.test(char)) {
let value = '';
while (/[0-9]/.test(char)) {
value += char;
char = input[++current];
}
tokens.push({
type: 'number',
value,
})
continue;
}
// letters (names)
if (/[a-z]/i.test(char)) {
let value = '';
while (/[a-z]/i.test(char)) {
value += char;
char = input[++current];
}
tokens.push({
type: 'name',
value,
});
continue;
}
throw new Error(`Unknown token: ${char}`);
}
return tokens;
} const parser = (tokens) => {
let current = 0;
const walk = () => {
let token = tokens[current];
// NumberLiteral
if (token.type === 'number') {
current++;
return {
type: 'NumberLiteral',
value: token.value,
}
}
// CallExpression
if (token.type === 'paren' && token.value === '(') {
token = tokens[++current];
const node = {
type: 'CallExpression',
name: token.value,
params: [],
};
token = tokens[++current];
while (
token.type !== 'paren' ||
(token.value !== ')' && token.type === 'paren')
) {
node.params.push(walk());
token = tokens[current];
}
current++;
return node;
}
throw new Error(`Unknown token type: ${token.type}`);
}
const ast = {
type: 'Program',
body: [],
}
while (current < tokens.length) {
ast.body.push(walk());
}
return ast;
}Transform phase: Describes traversing the AST with a visitor pattern, creating a new AST for C‑style output, and provides the traverser, visitor, and transformer implementations.
const traverser = (ast, visitor) => {
const traverseArray = (array, parent) => {
array.forEach((child) => {
traverseNode(child, parent);
});
}
const traverseNode = (node, parent) => {
const method = visitor[node.type];
if (method) {
method(node, parent);
}
switch (node.type) {
case 'Program':
traverseArray(node.body, node);
break;
case 'CallExpression':
traverseArray(node.params, node);
break;
case 'NumberLiteral':
break;
default:
throw new Error(`Unknown node type: ${node.type}`);
}
}
traverseNode(ast, null);
} const transformer = (ast) => {
const newAst = {
type: 'Program',
body: [],
}
ast._context = newAst.body;
traverser(ast, {
NumberLiteral: (node, parent) => {
parent._context.push({
type: 'NumberLiteral',
value: node.value,
});
},
CallExpression: (node, parent) => {
let expression = {
type: 'CallExpression',
callee: {
type: 'Identifier',
name: node.name
},
arguments: [],
};
node._context = expression.arguments;
if (parent.type !== 'CallExpression') {
expression = {
type: 'ExpressionStatement',
expression,
};
}
parent._context.push(expression);
}
});
return newAst;
}Code generation: Shows a recursive generator that converts the transformed AST into JavaScript code, with an example generator function.
const codeGenerator = (node) => {
switch (node.type) {
case 'Program':
return node.body.map(codeGenerator).join('\n');
case 'ExpressionStatement':
return `${codeGenerator(node.expression)};`;
case 'CallExpression':
return `${codeGenerator(node.callee)}(${node.arguments.map(codeGenerator).join(', ')})`;
case 'Identifier':
return node.name;
case 'NumberLiteral':
return node.value;
default:
throw new Error(`Unknown node type: ${node.type}`);
}
}Compiler pipeline: Combines tokenizer, parser, transformer, and generator into a single compiler function, outlining the data flow input → tokens → AST → new AST → output.
const compiler = (input) => {
const tokens = tokenizer(input);
const ast = parser(tokens);
const newAst = transformer(ast);
return codeGenerator(newAst);
}Conclusion: The tiny compiler illustrates the three‑step compile process (parse, transform, generate) used by modern tools like Babel, helping readers understand compiler internals.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.