Using PHP FFI to Call the Cjieba Chinese Word Segmentation Library

This article demonstrates how to use PHP 7.4's FFI to directly call the Cjieba Chinese word‑segmentation library, explains common pitfalls such as uninitialized variables and pointer handling, shows code examples for compiling and running the library, and compares PHP's performance with native C.

php Courses
php Courses
php Courses
Using PHP FFI to Call the Cjieba Chinese Word Segmentation Library

This guide explains how to employ PHP 7.4's Foreign Function Interface (FFI) to invoke the Cjieba Chinese word‑segmentation library (cjieba) directly from PHP, avoiding the need to write a custom PHP extension.

The author chose CJieba because FFI follows the C calling convention; using C++ would require additional extern "C" wrappers. Several issues were encountered, such as uninitialized C variables, calling C functions without an initialized FFI object, the need for FFI::isNull($x) for null checks, and the fact that pointer‑based arrays cannot be iterated with foreach.

To iterate over the pointer‑based array returned by Cut, the author switched to pointer arithmetic, which works correctly in FFI:

CJiebaWord* Cut(Jieba handle, const char* sentence, size_t len)
{
cppjieba::Jieba* x = (cppjieba::Jieba*)handle;
vector<string> words;
string s(sentence, len);
x->Cut(s, words);
CJiebaWord* res = (CJiebaWord*)malloc(sizeof(CJiebaWord) * (words.size() + 1));
size_t offset = 0;
for (size_t i = 0; i < words.size(); i++) {
res[i].word = sentence + offset;
res[i].len = words[i].size();
offset += res[i].len;
}
if (offset != len) {
free(res);
return NULL;
}
res[words.size()].word = NULL;
res[words.size()].len = 0;
return res;
}

The segmentation result is accessed via the word pointer and len fields; in PHP the string must be sliced with substr($x->word, 0, $x->len). An example loop to print each word:

for (x = words; x->word; x++) {
printf("%*.*s
", x->len, x->len, x->word);
}

Compilation and execution steps are straightforward:

make libjieba.so
time php demo.php
make demo
time ./demo

Performance results show that the PHP version using FFI runs almost as fast as the native C version, with CPU usage around 12% for both:

PHP
load: 0.00025701522827148
real    1m59.619s
user    1m56.093s
sys     0m3.517s
C
real    1m54.738s
user    1m50.382s
sys     0m4.323s
CPU 占用 基本都是 12%

Thus, for CPU‑intensive tasks, using FFI to call libraries written in C, C++, Go, Rust, etc., can provide near‑native performance without the overhead of writing a full PHP extension.

The article also mentions that before FFI, developers had to write PHP extensions, which required deep knowledge of both C and the PHP internals. FFI simplifies this by allowing direct calls to standard C dynamic libraries.

Additionally, the author notes that macro‑expanded headers (e.g., from SDKs) can be generated with gcc -E -P HCNetSDK.h -o HCNetSDK_unfold.h and safely used.

Source code and further details are available at the GitHub repository: https://github.com/dwdcth/phpjieba_ffi .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendFFIPHPword segmentationCjieba
php Courses
Written by

php Courses

php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.