Analyzing PHP Core: Variables and Zend Engine Structures (Part 1)
This article introduces the PHP variable system by examining the Zend Engine's source code, covering variable naming rules, value types, reference and variable variables, file extensions in the PHP source tree, and a detailed walkthrough of the zend_types.h definitions, unions, structs, and memory‑alignment considerations.
This article introduces the series "Parsing PHP Core Source – Variables (Part 1)" and aims to help readers understand PHP's internal variable implementation.
PHP's core elements include variables, arrays, memory management, SAPI, and the virtual machine; variables are a good starting point because PHP has 20 variable types.
Four basic characteristics of PHP variables:
1. Variable naming – PHP follows Perl syntax: a variable starts with a dollar sign ($) followed by a name that begins with a letter or underscore and may contain letters, numbers, or underscores. The regular expression is ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$ . The special variable $this cannot be assigned.
Variables are assigned by value by default; assigning one variable to another copies the value, so changes to one do not affect the other. PHP also supports reference assignment using the & operator, making two variables aliases of the same value.
PHP supports variable variables (e.g., $$a ) and array variable variables, which require disambiguation using syntax such as ${$a[1]} or ${$a}[1] . Class properties can also be accessed via variable property names (e.g., $foo->$bar ), but variable variables cannot be used with superglobals or $this .
2. Variable types – PHP is a weakly typed language; a variable can hold any type at runtime. The article will later explore how the Zend Engine stores these types and performs type conversion.
To explore the source, the author downloaded PHP 7.4.15 source in a Docker container, installed gcc and related tools, and listed the source tree:
<code>[root@2890cf458ee2 cui-php]# ls
php-7.4.15 php-7.4.15.tar.gz
[root@2890cf458ee2 cui-php]# cd php-7.4.15
[root@2890cf458ee2 php-7.4.15]# ls
CODING_STANDARDS.md Makefile.objects UPGRADING.INTERNALS buildconf configure.ac main tests
CONTRIBUTING.md NEWS Zend buildconf.bat docs modules tmp-php.ini
...</code>The PHP source directory structure is then described:
Zend : implementation of the Zend Engine (lexer, parser, opcode, runtime).
TSRM : Thread‑Safe Resource Manager.
build : build‑related files.
ext : official extensions (array, PDO, SPL, etc.).
main : core PHP macros and definitions.
pear : PEAR repository.
sapi : server API implementations (mod_php, CGI, FastCGI, FPM).
tests : test suite.
scripts : Linux scripts.
win32 : Windows‑specific code.
The variable definitions reside in Zend/zend_types.h . After navigating to the Zend directory:
<code>[root@2890cf458ee2 php-7.4.15]# cd Zend/
[root@2890cf458ee2 Zend]# ll
total 22404
-rwxrwxrwx 1 root root 2803 Feb 2 14:20 LICENSE
-rwxrwxrwx 1 root root 2008 Feb 2 14:20 Makefile.frag
-rwxrwxrwx 1 root root 4607 Feb 2 14:20 README.md
... (many source files listed) ...
</code>Only .h and .c files are needed for understanding; other extensions like .o , .lo , .so are build artifacts. The article explains common file types in C projects:
.c : source files.
.h : header files with declarations, structs, macros.
.o : object files (result of compilation).
.so : shared library (ELF format, position‑independent).
.lo : libtool object file used to build shared libraries.
Opening zend_types.h shows the license header and then the actual definitions. The first relevant part defines endianness macros, basic types, and the zend_bool and zend_uchar types.
<code>#ifndef ZEND_TYPES_H
#define ZEND_TYPES_H
#include "zend_portability.h"
#include "zend_long.h"
#ifdef __SSE2__
# include <mmintrin.h>
# include <emmintrin.h>
#endif
#ifdef WORDS_BIGENDIAN
# define ZEND_ENDIAN_LOHI(lo, hi) hi; lo;
# define ZEND_ENDIAN_LOHI_3(lo, mi, hi) hi; mi; lo;
# define ZEND_ENDIAN_LOHI_4(a, b, c, d) d; c; b; a;
# define ZEND_ENDIAN_LOHI_C(lo, hi) hi, lo
# define ZEND_ENDIAN_LOHI_C_3(lo, mi, hi) hi, mi, lo,
# define ZEND_ENDIAN_LOHI_C_4(a, b, c, d) d, c, b, a
#else
# define ZEND_ENDIAN_LOHI(lo, hi) lo; hi;
# define ZEND_ENDIAN_LOHI_3(lo, mi, hi) lo; mi; hi;
# define ZEND_ENDIAN_LOHI_4(a, b, c, d) a; b; c; d;
# define ZEND_ENDIAN_LOHI_C(lo, hi) lo, hi
# define ZEND_ENDIAN_LOHI_C_3(lo, mi, hi) lo, mi, hi,
# define ZEND_ENDIAN_LOHI_C_4(a, b, c, d) a, b, c, d
#endif
typedef unsigned char zend_bool;
typedef unsigned char zend_uchar;
</code>The core of the variable representation is the _zval_struct defined later:
<code>struct _zval_struct {
zend_value value; /* value */
union {
struct {
ZEND_ENDIAN_LOHI_3(
zend_uchar type, /* active type */
zend_uchar type_flags,
union { uint16_t extra; } u)
} v;
uint32_t type_info;
} u1;
union {
uint32_t next; /* hash collision chain */
uint32_t cache_slot; /* cache slot (for RECV_INIT) */
uint32_t opline_num; /* opline number (for FAST_CALL) */
uint32_t lineno; /* line number (for ast nodes) */
uint32_t num_args; /* arguments number for EX(This) */
uint32_t fe_pos; /* foreach position */
uint32_t fe_iter_idx; /* foreach iterator index */
uint32_t access_flags; /* class constant access flags */
uint32_t property_guard; /* single property guard */
uint32_t constant_flags; /* constant flags */
uint32_t extra; /* not further specified */
} u2;
};
</code>The zend_value union, referenced by value , holds the actual data and is defined earlier:
<code>typedef union _zend_value {
zend_long lval; /* long value */
double dval; /* double value */
zend_refcounted *counted;
zend_string *str;
zend_array *arr;
zend_object *obj;
zend_resource *res;
zend_reference *ref;
zend_ast_ref *ast;
zval *zv;
void *ptr;
zend_class_entry *ce;
zend_function *func;
struct { uint32_t w1; uint32_t w2; } ww;
} zend_value;
</code>Key points about the struct:
value occupies 8 bytes (a pointer on 64‑bit systems).
u1 stores the variable type and flags; type_info is a 32‑bit field.
u2 is a union used for various auxiliary data (hash chain, line number, foreach position, etc.).
Memory alignment pads the structure to 16 bytes on modern PHP versions, improving cache performance.
The article also briefly explains C concepts such as structs, unions, pointers, and memory alignment, illustrating how the Zend Engine uses these mechanisms to implement PHP's flexible variable system.
Overall, the walkthrough provides a foundation for deeper exploration of PHP's internals, including reference counting, copy‑on‑write, and the interaction between the Zend Engine and the PHP runtime.
php中文网 Courses
php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.