Unveiling Hiphop: From PHP to HHVM – A Deep Dive into Compiler Fundamentals
This article explains Facebook's Hiphop tool, its evolution from static PHP-to-C++ compilation to HHVM with JIT, and provides a detailed walkthrough of compiler fundamentals including lexical, syntax, and semantic analysis, intermediate code generation, optimization, and code generation.
Hiphop is a Facebook-developed tool that originally compiled PHP to C++ and later evolved to HHVM with JIT, offering significant CPU savings and performance improvements.
The evolution stages are HPHPC → HPHPI → HHVM, where HPHPC is static compilation, HPHPI is a transitional product similar to the PHP Zend VM, and HHVM applies JIT technology.
Main Content
1. Introduction to Compiler Principles
1.1 Compiler Structure
Character stream
Token generation
Syntax tree generation
Syntax-directed translation to abstract syntax tree and symbol table
Intermediate code generation from AST or three‑address code
Intermediate code optimization
Target machine code generation
Final machine code optimization
Example expression: a=b+c*60 Lexical analysis: The expression is split into tokens such as
<float,1> <equal,2> <float,3> <add,4> <float,5> <mul,6> <int,7>.
Syntax analysis:
The syntax tree is built from the tokens.
Semantic analysis:
Semantic analysis checks type consistency, e.g., converting an integer constant to float.
Intermediate code generation: The compiler generates intermediate code such as:
t1=intfloat(60)
t2=float3*t1
t3=float2+t2
float1=t3Code optimizer: Optimizes intermediate code (pre‑optimizer and post‑optimizer) to reduce instruction count.
Optimized example:
t1=float3*60.0
float1=float2+t1Code generator: Produces target machine code; HHVM generates its own bytecode while HPHPC ultimately produces native machine code.
1.2 Hiphop Compiler Structure
Front‑end components:
Parser – lexical and syntax analysis
Static Analyzer – syntax‑directed translation and AST generation
Type Inference Engine – semantic analysis
Pre‑optimizer / Post‑optimizer – code optimization
Code generation – intermediate code generation
Back‑end component:
G++ – native code compilation
1.3 Lexical Analyzer
The lexical analyzer tokenizes the source code into <token-name,attribute-value> pairs, typically using regular expressions or tools like lex.
1.4 Syntax Analyzer
The syntax analyzer builds a parse tree from the token stream, handling errors and providing the tree to later compiler stages.
1.4.1 Grammar Design
Distinguishes lexical elements (identifiers, literals) from syntactic structures (if‑else, loops).
1.4.2 Top‑Down Parsing
Example grammar for arithmetic expressions:
Expr : Number
| Expr + Expr
| Expr - Expr
| Expr * Expr
| Expr / Expr1.4.3 Bottom‑Up (Shift‑Reduce) Parsing
Demonstrates shift‑reduce steps for the expression 1+2/3+4*6-3-2.
1.5 Semantic Analyzer
Uses the AST and symbol table to verify type correctness, perform type conversion, overload resolution, and type inference.
1.6 Intermediate Code Generator
Intermediate representations include abstract syntax trees and three‑address code, e.g., translating x+y*z to:
t1 = y * z
t2 = x + t1Complex example for a=b*-c+b*-c:
t1 = -c
t2 = b * t1
t3 = -c
t4 = b * t3
t5 = t2 + t4
a = t51.7 Optimizer and Code Generator
The optimizer reduces instruction count by eliminating redundant copies and considering context during code emission; the code generator emits the final target machine code.
References:
Yacc and Lex tutorial: http://blog.csdn.net/liwei_cmg/article/details/1530492
Compilers: Principles, Techniques, and Tools (2nd Edition) by Aho et al.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
