How to Find and Understand PHP Internal Function Definitions (e.g., strpos)
This guide walks you through locating PHP internal function definitions in the source tree, using strpos as an example, and explains the surrounding C code structure, parameter parsing, error handling, and macro usage for deeper insight into PHP's backend implementation.
In the previous article, ircmaxell explained where to find the PHP source code, its basic directory layout, and gave a brief introduction to C since PHP is written in C. If you missed that article, you might want to read it before continuing.
This article discusses how to locate the definition of a PHP internal function and understand its inner workings.
How to Find the Function Definition
We start by trying to find the definition of the strpos function.
The first step is to go to the PHP 5.4 root directory and use the search box at the top of the page to search for strpos. The result is a long list of places where strpos appears in the source.
Because this result is not very helpful, we use a trick: search for " PHP_FUNCTION strpos " (including the double quotes) instead of just strpos.
Now we get two entry links:
/PHP_5_4/ext/standard/
php_string.h 48 PHP_FUNCTION(strpos);
string.c 1789 PHP_FUNCTION(strpos)Both locations are inside the ext/standard folder, which is where the strpos function resides because it is part of the standard extension.
Opening the first link leads to php_string.h, which contains a simple list of function declarations such as:
// ...
PHP_FUNCTION(strpos);
PHP_FUNCTION(stripos);
PHP_FUNCTION(strrpos);
PHP_FUNCTION(strripos);
PHP_FUNCTION(strrchr);
PHP_FUNCTION(substr);
// ...This is a typical header file: it only declares functions; the actual implementations are elsewhere.
The second link points to string.c, which holds the real source code of the function.
PHP Function Skeleton
All PHP functions share a common structure. At the top, variables are defined, then zend_parse_parameters is called, followed by the main logic that uses RETURN_*** and php_error_docref macros.
zval *needle;
char *haystack;
char *found = NULL;
char needle_char[2];
long offset = 0;
int haystack_len;The first line defines a pointer needle to a zval, which represents any PHP variable internally.
The second line defines a pointer haystack to a character array. In C, an array name is a pointer to its first element, so haystack points to the first character of the string passed from PHP.
PHP also stores the explicit length of the string in haystack_len to know where the string ends.
The offset variable holds the third argument of the function, the starting position for the search, and is stored as a long.
Parameter parsing is performed with:
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "sz|l", &haystack, &haystack_len, &needle, &offset) == FAILURE) {
return;
}This extracts the passed arguments and stores them in the variables declared above. ZEND_NUM_ARGS() provides the number of arguments, and the macro TSRMLS_CC is part of PHP's thread‑safe resource manager (TSRM); it can be ignored for understanding the code.
The format string "sz|l" describes the expected parameters:
s // first parameter is a string
z // second parameter is a zval (any variable)
| // following parameters are optional
l // third parameter is a long (integer)After parsing, the main logic checks the offset bounds:
if (offset < 0 || offset > haystack_len) {
php_error_docref(NULL TSRMLS_CC, E_WARNING, "Offset not contained in string");
RETURN_FALSE;
}If the offset is out of range, a warning is emitted via php_error_docref and the function returns FALSE.
When the needle is a string, the code calls php_memnstr to locate the first occurrence:
if (Z_TYPE_P(needle) == IS_STRING) {
if (!Z_STRLEN_P(needle)) {
php_error_docref(NULL TSRMLS_CC, E_WARNING, "Empty delimiter");
RETURN_FALSE;
}
found = php_memnstr(haystack + offset,
Z_STRVAL_P(needle),
Z_STRLEN_P(needle),
haystack + haystack_len);
} php_memnstrreturns a pointer to the first occurrence of needle in haystack. The final result is returned as the offset difference: RETURN_LONG(found - haystack); If the needle is not a string, it is treated as a character code:
else {
if (php_needle_char(needle, needle_char TSRMLS_CC) != SUCCESS) {
RETURN_FALSE;
}
needle_char[1] = 0;
found = php_memnstr(haystack + offset,
needle_char, 1,
haystack + haystack_len);
}Thus, strpos($str, 'A') and strpos($str, 65) are equivalent because the character 'A' has ASCII code 65.
Zend Functions
Searching for strlen follows the same method, but strlen is defined with ZEND_FUNCTION(strlen) rather than PHP_FUNCTION because it is part of the Zend Engine:
ZEND_FUNCTION(strlen)
{
char *s1;
int s1_len;
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &s1, &s1_len) == FAILURE) {
return;
}
RETVAL_LONG(s1_len);
}This implementation is straightforward.
Methods
To find class methods, you can search for ClassName::methodName, e.g., SplFixedArray::getSize.
Next Part
The next article will discuss what a zval is, how it works, and how the Z_* macros are used throughout the source.
Source: PHP Developer
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
