Backend Development 6 min read

Using PHP FFI to Call the Cjieba Chinese Word Segmentation Library

This article demonstrates how to use PHP 7.4's FFI to directly call the Cjieba Chinese word‑segmentation library, explains common pitfalls such as uninitialized variables and pointer handling, shows code examples for compiling and running the library, and compares PHP's performance with native C.

php中文网 Courses
php中文网 Courses
php中文网 Courses
Using PHP FFI to Call the Cjieba Chinese Word Segmentation Library

This guide explains how to employ PHP 7.4's Foreign Function Interface (FFI) to invoke the Cjieba Chinese word‑segmentation library (cjieba) directly from PHP, avoiding the need to write a custom PHP extension.

The author chose CJieba because FFI follows the C calling convention; using C++ would require additional extern "C" wrappers. Several issues were encountered, such as uninitialized C variables, calling C functions without an initialized FFI object, the need for FFI::isNull($x) for null checks, and the fact that pointer‑based arrays cannot be iterated with foreach .

To iterate over the pointer‑based array returned by Cut , the author switched to pointer arithmetic, which works correctly in FFI:

CJiebaWord* Cut(Jieba handle, const char* sentence, size_t len)

{

cppjieba::Jieba* x = (cppjieba::Jieba*)handle;

vector<string> words;

string s(sentence, len);

x-&gt;Cut(s, words);

CJiebaWord* res = (CJiebaWord*)malloc(sizeof(CJiebaWord) * (words.size() + 1));

size_t offset = 0;

for (size_t i = 0; i < words.size(); i++) {

res[i].word = sentence + offset;

res[i].len = words[i].size();

offset += res[i].len;

}

if (offset != len) {

free(res);

return NULL;

}

res[words.size()].word = NULL;

res[words.size()].len = 0;

return res;

}

The segmentation result is accessed via the word pointer and len fields; in PHP the string must be sliced with substr($x-&gt;word, 0, $x-&gt;len) . An example loop to print each word:

for (x = words; x-&gt;word; x++) {

printf("%*.*s\n", x-&gt;len, x-&gt;len, x-&gt;word);

}

Compilation and execution steps are straightforward:

make libjieba.so

time php demo.php

make demo

time ./demo

Performance results show that the PHP version using FFI runs almost as fast as the native C version, with CPU usage around 12% for both:

PHP

load: 0.00025701522827148

real 1m59.619s

user 1m56.093s

sys 0m3.517s

C

real 1m54.738s

user 1m50.382s

sys 0m4.323s

CPU 占用 基本都是 12%

Thus, for CPU‑intensive tasks, using FFI to call libraries written in C, C++, Go, Rust, etc., can provide near‑native performance without the overhead of writing a full PHP extension.

The article also mentions that before FFI, developers had to write PHP extensions, which required deep knowledge of both C and the PHP internals. FFI simplifies this by allowing direct calls to standard C dynamic libraries.

Additionally, the author notes that macro‑expanded headers (e.g., from SDKs) can be generated with gcc -E -P HCNetSDK.h -o HCNetSDK_unfold.h and safely used.

Source code and further details are available at the GitHub repository: https://github.com/dwdcth/phpjieba_ffi .

BackendperformanceFFIPHPword segmentationCjieba
php中文网 Courses
Written by

php中文网 Courses

php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.