Understanding PHP Garbage Collection and Reference Counting
This article explains PHP's garbage collection mechanism, detailing how reference counting works, the role of zval structures, the identification and reclamation process, and provides source code insights to compare PHP's approach with Java's garbage collection.
1. PHP Reference Counting
1.1 How Garbage Is Determined
Each object has a reference counter that increments when a new reference is created and decrements when a reference is destroyed. When the counter reaches zero, the value can be released and is not considered garbage; otherwise it may become garbage and is later examined by the collector.
If refcount becomes 0 after decrement, the value is released.
If refcount remains >0, the value is a potential garbage.
Potential garbage is accumulated until a threshold triggers the garbage identification program , which finally releases true garbage.
Drawback: Maintaining the counter adds overhead and struggles with cyclic references.
1.2 PHP Variable Internals
Every PHP variable is stored in a zval container, which holds the type, value, and two extra bytes: is_ref (indicates if the variable belongs to a reference set) and refcount (the number of symbols pointing to this zval).
1.3 Types Using Reference Counting
The following types are reference‑counted: string, array, object, resource, reference . Only variables whose type_flag is among the eight specified types and have IS_TYPE_REFCOUNTED=true use this mechanism.
2. Collection Principles
2.1 Collection Timing
Automatic collection: When a zval's refcount drops to 0, the value is immediately released.
Manual collection: Calling unset() (similar to Java's System.gc() ) triggers explicit cleanup.
2.2 Garbage Identification
The collector gathers possible garbage until a threshold is met, then runs a multi‑step identification process:
Mark values gray, decrement members' refcount to white.
If a value's refcount is 0, it is garbage (black); otherwise restore the count.
Remove non‑white nodes from the buffer.
Release the remaining garbage values.
3. Source Code Overview
3.1 Garbage Caretaker
The _zend_gc_globals structure acts as the garbage caretaker, storing possible garbage in a buffer ( buf ).
3.2 Initialization
During PHP startup, gc_init() parses php.ini , allocates the buffer, and calls gc_reset() to set up internal variables.
3.3 Determining Need for Collection
When a variable is destroyed, i_zval_ptr_dtor() checks the refcount . If it becomes 0, the value is released; if >0, the value is marked as possible garbage for later collection.
3.4 Collection Process
The gc_possible_root routine scans the buffer, extracts unused nodes, and when the buffer is full, triggers the identification and reclamation phases.
3.5 Releasing Garbage
The core function zend_gc_collect_cycles() performs the following steps:
Scan root nodes
Collect root nodes
Invoke the collector
Clean up variables
Finish collection
Example Code
$x = array();
$y = $x;
$z = $y;
unset($y);4. Conclusion
PHP uses reference counting, unlike Java's reachability analysis.
PHP adds a custom algorithm to handle cyclic references by temporarily reducing reference counts and using color marking (white, black, purple) to identify true garbage.
The three colors represent: white – garbage, black – live data, purple – prevents duplicate insertion.
Is PHP the best language? The discussion is open.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.