Understanding PHP_CodeSniffer: Tokenization, Lexical Analysis, and Custom Rule Creation
This article explains how PHP_CodeSniffer parses PHP source code into tokens using lexical analysis, demonstrates token extraction with token_get_all, and guides readers through creating a custom rule to prohibit hash‑style comments, covering rule library setup, Sniff implementation, and execution.
PHP_CodeSniffer is an open‑source tool that checks PHP code style by parsing source files into a TOKEN array through lexical analysis. The article begins by introducing the principle of static analysis in PHP_CodeSniffer.
Programming languages are divided into compiled (e.g., C/C++, Java) and interpreted (e.g., PHP, JavaScript, Python) categories. The PHP execution process involves the Zend engine performing lexical and syntax analysis, compiling code into opcodes, and optionally using opcode caches.
The concept of TOKEN is explained: during lexical parsing, PHP language elements are represented as constants like T_XXX . A table of token identifiers (e.g., T_ABSTRACT , T_ARRAY ) is provided. PHP offers token_get_all(string $source) to obtain the token sequence of a given source string.
Example code shows how to retrieve tokens for a simple "Hello World" script:
<?php
$tokens = token_get_all ('<?php echo "Hello World!"; ?>');
print_r($tokens);
?>The output demonstrates that each token is an array containing a unique numeric value, the token type, and the actual code snippet. Using token_name() with these numeric values reveals the human‑readable token names such as T_OPEN_TAG , T_ECHO , and T_CONSTANT_ENCAPSED_STRING .
After understanding tokenization, the article walks through creating a custom rule that forbids the use of # for single‑line comments. It covers:
Creating a new rule set directory (e.g., FireLine ) with a ruleset.xml defining the rule library.
Implementing a Sniff class ( DisallowHashCommentsSniff.php ) that registers for T_COMMENT tokens and checks if the first character of the token content is # , reporting an error when found.
Running the custom rule via the command line:
php D:/git/PHP_CodeSniffer/bin/phpcs \
--standard=D:/git/PHP_CodeSniffer/src/Standards/FireLine \
D:/git/PHP_CodeSniffer/src/Standards/FireLine/Tests \
--report=xml --report-file=E:/RedlineReport/php_report01.xmlThe generated XML report lists each occurrence of prohibited hash comments, confirming that the custom rule works as intended.
In conclusion, PHP_CodeSniffer primarily checks coding style through lexical analysis, which limits its ability to detect more complex issues compared to tools that use full syntax trees. Nevertheless, its pattern‑matching approach enables straightforward rule creation and automatic code fixing via the PHP Code Beautifier and Fixer (phpcbf) feature.
360 Quality & Efficiency
360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.