Fundamentals 13 min read

Unveiling Hiphop: From PHP to HHVM – A Deep Dive into Compiler Fundamentals

This article explains Facebook's Hiphop tool, its evolution from static PHP-to-C++ compilation to HHVM with JIT, and provides a detailed walkthrough of compiler fundamentals including lexical, syntax, and semantic analysis, intermediate code generation, optimization, and code generation.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Unveiling Hiphop: From PHP to HHVM – A Deep Dive into Compiler Fundamentals

Hiphop is a Facebook-developed tool that originally compiled PHP to C++ and later evolved to HHVM with JIT, offering significant CPU savings and performance improvements.

The evolution stages are HPHPC → HPHPI → HHVM, where HPHPC is static compilation, HPHPI is a transitional product similar to the PHP Zend VM, and HHVM applies JIT technology.

Main Content

1. Introduction to Compiler Principles

1.1 Compiler Structure

Character stream

Token generation

Syntax tree generation

Syntax-directed translation to abstract syntax tree and symbol table

Intermediate code generation from AST or three‑address code

Intermediate code optimization

Target machine code generation

Final machine code optimization

Example expression: a=b+c*60 Lexical analysis: The expression is split into tokens such as

<float,1> <equal,2> <float,3> <add,4> <float,5> <mul,6> <int,7>

.

Syntax analysis:

The syntax tree is built from the tokens.

Semantic analysis:

Semantic analysis checks type consistency, e.g., converting an integer constant to float.

Intermediate code generation: The compiler generates intermediate code such as:

t1=intfloat(60)
t2=float3*t1
t3=float2+t2
float1=t3

Code optimizer: Optimizes intermediate code (pre‑optimizer and post‑optimizer) to reduce instruction count.

Optimized example:

t1=float3*60.0
float1=float2+t1

Code generator: Produces target machine code; HHVM generates its own bytecode while HPHPC ultimately produces native machine code.

1.2 Hiphop Compiler Structure

Front‑end components:

Parser – lexical and syntax analysis

Static Analyzer – syntax‑directed translation and AST generation

Type Inference Engine – semantic analysis

Pre‑optimizer / Post‑optimizer – code optimization

Code generation – intermediate code generation

Back‑end component:

G++ – native code compilation

1.3 Lexical Analyzer

The lexical analyzer tokenizes the source code into <token-name,attribute-value> pairs, typically using regular expressions or tools like lex.

1.4 Syntax Analyzer

The syntax analyzer builds a parse tree from the token stream, handling errors and providing the tree to later compiler stages.

1.4.1 Grammar Design

Distinguishes lexical elements (identifiers, literals) from syntactic structures (if‑else, loops).

1.4.2 Top‑Down Parsing

Example grammar for arithmetic expressions:

Expr : Number
     | Expr + Expr
     | Expr - Expr
     | Expr * Expr
     | Expr / Expr

1.4.3 Bottom‑Up (Shift‑Reduce) Parsing

Demonstrates shift‑reduce steps for the expression 1+2/3+4*6-3-2.

1.5 Semantic Analyzer

Uses the AST and symbol table to verify type correctness, perform type conversion, overload resolution, and type inference.

1.6 Intermediate Code Generator

Intermediate representations include abstract syntax trees and three‑address code, e.g., translating x+y*z to:

t1 = y * z
t2 = x + t1

Complex example for a=b*-c+b*-c:

t1 = -c
t2 = b * t1
t3 = -c
t4 = b * t3
t5 = t2 + t4
a = t5

1.7 Optimizer and Code Generator

The optimizer reduces instruction count by eliminating redundant copies and considering context during code emission; the code generator emits the final target machine code.

References:

Yacc and Lex tutorial: http://blog.csdn.net/liwei_cmg/article/details/1530492

Compilers: Principles, Techniques, and Tools (2nd Edition) by Aho et al.

compilerhhvmlexical analysishiphopcode-generation
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.