Fundamentals 13 min read

Implementing a Super Tiny Compiler: Parsing, Transforming, and Code Generation

This article walks through creating a minimal JavaScript compiler—starting from tokenizing Lisp‑style input, building an abstract syntax tree, traversing and transforming it, and finally generating equivalent C‑style code, illustrating the core parse‑transform‑generate workflow used by tools like Babel.

IT Services Circle
IT Services Circle
IT Services Circle
Implementing a Super Tiny Compiler: Parsing, Transforming, and Code Generation

Introduction: The article explores the super‑tiny‑compiler repository, a <200‑line JavaScript compiler that demonstrates fundamental compilation concepts similar to Babel.

Parse phase: It explains lexical analysis turning source like (add 2 (subtract 4 2)) into tokens, then syntax analysis building an AST, showing example token array and AST structure.

const tokenizer = (input) => {
  const tokens = [];
  let current = 0;
  while (current < input.length) {
    let char = input[current];
    if (char === '(') {
      tokens.push({
        type: 'paren',
        value: '('
      })
      current++;
      continue;
    }
    if (char === ')') {
      tokens.push({
        type: 'paren',
        value: ')'
      })
      current ++;
      continue;
    }
    // whitespace
    if (/\s/.test(char)) {
      current++;
      continue;
    }
    // numbers
    if (/[0-9]/.test(char)) {
      let value = '';
      while (/[0-9]/.test(char)) {
        value += char;
        char = input[++current];
      }
      tokens.push({
        type: 'number',
        value,
      })
      continue;
    }
    // letters (names)
    if (/[a-z]/i.test(char)) {
      let value = '';
      while (/[a-z]/i.test(char)) {
        value += char;
        char = input[++current];
      }
      tokens.push({
        type: 'name',
        value,
      });
      continue;
    }
    throw new Error(`Unknown token: ${char}`);
  }
  return tokens;
}
const parser = (tokens) => {
  let current = 0;
  const walk = () => {
    let token = tokens[current];
    // NumberLiteral
    if (token.type === 'number') {
      current++;
      return {
        type: 'NumberLiteral',
        value: token.value,
      }
    }
    // CallExpression
    if (token.type === 'paren' && token.value === '(') {
      token = tokens[++current];
      const node = {
        type: 'CallExpression',
        name: token.value,
        params: [],
      };
      token = tokens[++current];
      while (
        token.type !== 'paren' ||
        (token.value !== ')' && token.type === 'paren')
      ) {
        node.params.push(walk());
        token = tokens[current];
      }
      current++;
      return node;
    }
    throw new Error(`Unknown token type: ${token.type}`);
  }
  const ast = {
    type: 'Program',
    body: [],
  }
  while (current < tokens.length) {
    ast.body.push(walk());
  }
  return ast;
}

Transform phase: Describes traversing the AST with a visitor pattern, creating a new AST for C‑style output, and provides the traverser, visitor, and transformer implementations.

const traverser = (ast, visitor) => {
  const traverseArray = (array, parent) => {
    array.forEach((child) => {
      traverseNode(child, parent);
    });
  }
  const traverseNode = (node, parent) => {
    const method = visitor[node.type];
    if (method) {
      method(node, parent);
    }
    switch (node.type) {
      case 'Program':
        traverseArray(node.body, node);
        break;
      case 'CallExpression':
        traverseArray(node.params, node);
        break;
      case 'NumberLiteral':
        break;
      default:
        throw new Error(`Unknown node type: ${node.type}`);
    }
  }
  traverseNode(ast, null);
}
const transformer = (ast) => {
  const newAst = {
    type: 'Program',
    body: [],
  }
  ast._context = newAst.body;
  traverser(ast, {
    NumberLiteral: (node, parent) => {
      parent._context.push({
        type: 'NumberLiteral',
        value: node.value,
      });
    },
    CallExpression: (node, parent) => {
      let expression = {
        type: 'CallExpression',
        callee: {
          type: 'Identifier',
          name: node.name
        },
        arguments: [],
      };
      node._context = expression.arguments;
      if (parent.type !== 'CallExpression') {
        expression = {
          type: 'ExpressionStatement',
          expression,
        };
      }
      parent._context.push(expression);
    }
  });
  return newAst;
}

Code generation: Shows a recursive generator that converts the transformed AST into JavaScript code, with an example generator function.

const codeGenerator = (node) => {
  switch (node.type) {
    case 'Program':
      return node.body.map(codeGenerator).join('\n');
    case 'ExpressionStatement':
      return `${codeGenerator(node.expression)};`;
    case 'CallExpression':
      return `${codeGenerator(node.callee)}(${node.arguments.map(codeGenerator).join(', ')})`;
    case 'Identifier':
      return node.name;
    case 'NumberLiteral':
      return node.value;
    default:
      throw new Error(`Unknown node type: ${node.type}`);
  }
}

Compiler pipeline: Combines tokenizer, parser, transformer, and generator into a single compiler function, outlining the data flow input → tokens → AST → new AST → output.

const compiler = (input) => {
  const tokens = tokenizer(input);
  const ast = parser(tokens);
  const newAst = transformer(ast);
  return codeGenerator(newAst);
}

Conclusion: The tiny compiler illustrates the three‑step compile process (parse, transform, generate) used by modern tools like Babel, helping readers understand compiler internals.

code generationJavaScriptASTCompilerBabelParser
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.