Using PHP FFI to Call the Cjieba Chinese Word Segmentation Library
This article demonstrates how to use PHP 7.4's FFI to directly call the Cjieba Chinese word‑segmentation library, explains common pitfalls such as uninitialized variables and pointer handling, shows code examples for compiling and running the library, and compares PHP's performance with native C.
This guide explains how to employ PHP 7.4's Foreign Function Interface (FFI) to invoke the Cjieba Chinese word‑segmentation library (cjieba) directly from PHP, avoiding the need to write a custom PHP extension.
The author chose CJieba because FFI follows the C calling convention; using C++ would require additional extern "C" wrappers. Several issues were encountered, such as uninitialized C variables, calling C functions without an initialized FFI object, the need for FFI::isNull($x) for null checks, and the fact that pointer‑based arrays cannot be iterated with foreach .
To iterate over the pointer‑based array returned by Cut , the author switched to pointer arithmetic, which works correctly in FFI:
CJiebaWord* Cut(Jieba handle, const char* sentence, size_t len)
{
cppjieba::Jieba* x = (cppjieba::Jieba*)handle;
vector<string> words;
string s(sentence, len);
x->Cut(s, words);
CJiebaWord* res = (CJiebaWord*)malloc(sizeof(CJiebaWord) * (words.size() + 1));
size_t offset = 0;
for (size_t i = 0; i < words.size(); i++) {
res[i].word = sentence + offset;
res[i].len = words[i].size();
offset += res[i].len;
}
if (offset != len) {
free(res);
return NULL;
}
res[words.size()].word = NULL;
res[words.size()].len = 0;
return res;
}
The segmentation result is accessed via the word pointer and len fields; in PHP the string must be sliced with substr($x->word, 0, $x->len) . An example loop to print each word:
for (x = words; x->word; x++) {
printf("%*.*s\n", x->len, x->len, x->word);
}
Compilation and execution steps are straightforward:
make libjieba.so
time php demo.php
make demo
time ./demo
Performance results show that the PHP version using FFI runs almost as fast as the native C version, with CPU usage around 12% for both:
PHP
load: 0.00025701522827148
real 1m59.619s
user 1m56.093s
sys 0m3.517s
C
real 1m54.738s
user 1m50.382s
sys 0m4.323s
CPU 占用 基本都是 12%
Thus, for CPU‑intensive tasks, using FFI to call libraries written in C, C++, Go, Rust, etc., can provide near‑native performance without the overhead of writing a full PHP extension.
The article also mentions that before FFI, developers had to write PHP extensions, which required deep knowledge of both C and the PHP internals. FFI simplifies this by allowing direct calls to standard C dynamic libraries.
Additionally, the author notes that macro‑expanded headers (e.g., from SDKs) can be generated with gcc -E -P HCNetSDK.h -o HCNetSDK_unfold.h and safely used.
Source code and further details are available at the GitHub repository: https://github.com/dwdcth/phpjieba_ffi .
php中文网 Courses
php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.