Analysis of PHP Hexadecimal Addition Bug Caused by Lexer Change from Flex to re2c
An in‑depth examination of a long‑standing PHP bug where the expression 0x00+2 incorrectly evaluates to 4, tracing its origin to the switch from flex to re2c in the lexer of PHP 5.3, detailing affected versions, code behavior, and the eventual fix in PHP 5.3.11.
More than a decade ago, when PHP developers still argued that "PHP is the best language" and PHPCon was held in Shanghai, a subtle bug involving hexadecimal addition ( 0x00+2 ) was discovered (see https://bugs.php.net/bug.php?id=61095). At that time, writing a custom PHP MVC framework felt like the pinnacle of PHP engineering.
TL;DR : When the expression 0x00+2 is written without spaces around the plus sign, the lexer first interprets the whole token as a hexadecimal number ( 0x00 ) and then treats the trailing 2 as a decimal number, resulting in the erroneous sum 4 .
The bug affected PHP versions 5.3.0 through 5.3.10 (released between 2009‑06‑30 and 2012‑04‑25). Versions 5.1.x and 5.2.x were not impacted. The root cause was introduced in PHP 5.3.0, when the project switched its lexical analyser generator from flex to re2c .
In the older flex -based scanner, the macro yytext contained a null‑terminated string representing the matched token. After the switch to re2c , yytext became a pointer ( YYCURSOR ) to the start of the token, and the surrounding code that extracted the hexadecimal literal behaved differently.
Specifically, in the re2c version the variable hex was initialised with the entire remainder of the line (e.g., "+2, PHP_EOL;\n" ) instead of just the hexadecimal digits. The subsequent while loop removed leading zeros but left the string "+2, PHP_EOL;\n" , which strtol then parsed as the integer 2 . Combined with the original hexadecimal value 0x00 , the final result became 4 instead of the correct 2 .
The fix was added in PHP 5.3.11 (released 2012‑02‑24): before calling strtol , the code now checks whether the length of the token ( len ) is zero and handles that case appropriately, preventing the stray characters from being interpreted as a number.
Shortly after this fix, PHP 5.4.0 introduced binary literals (e.g., 0b001010 ). A similar bug resurfaced (https://bugs.php.net/bug.php?id=61225) where 0b0+1 incorrectly evaluated to 2 , illustrating how lexer‑related token handling can cause subtle arithmetic errors.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.