How to Use PHP to Read and Write Apache ORC Files with Swoole Phpy and PyORC
This guide explains how to install and use the workbunny/php-orc library, which leverages the Swoole Phpy extension to call Python's PyORC module from PHP, enabling efficient reading, writing, and schema handling of Apache ORC columnar files with practical code examples and troubleshooting tips.
Overview
The workbunny/php-orc library provides PHP bindings for reading and writing Apache ORC files. It uses the swoole/phpy extension to call the Python PyORC module, allowing PHP code to operate on ORC data through the Python ecosystem.
Apache ORC
Apache ORC is a column‑oriented storage format designed for the Hadoop ecosystem. It delivers high compression ratios and fast query performance by storing data column‑wise, which reduces I/O for large‑scale analytics workloads. ORC is widely supported by Hive, Spark and other big‑data processing engines.
Key Features
Read ORC files : PyORC can read rows or selected columns, enabling efficient column‑pruned scans.
Write ORC files : PHP can write arrays, dictionaries, etc. to ORC files by defining a schema (column names and types).
Schema handling : Retrieve an existing file’s schema or construct a custom schema for new files.
Swoole Phpy
Phpy is an open‑source Swoole extension that embeds a Python interpreter into PHP. It automatically maps PHP arrays to Python lists/dicts and scalar types between the two languages, providing low‑overhead cross‑language calls.
Phpy Capabilities
Seamless Python calls : Invoke any Python function, class or module directly from PHP scripts.
Automatic type mapping : PHP arrays ↔ Python lists/dicts; scalar types are converted transparently.
High performance : Optimised native implementation minimises the overhead of PHP‑Python interop.
Installation
Composer
composer require workbunny/php-orcHelper commands
After Composer installation, the package provides a CLI tool php-orc under vendor/bin. Use it to install runtime dependencies and to view help.
./vendor/bin/php-orc --helpPython runtime
The helper installs a suitable Python version (≥3.10). Run: .vendor/bin/php-orc install:python Verify the version:
# python -V
Python 3.12.8Phpy extension
Install the Phpy extension with the same helper: .vendor/bin/php-orc install:phpy Typical interactive output (directory defaults to .runtime):
[?] Please specify the installation directory (default: .runtime):
[>] Downloading the latest source code ...
Cloning into '/path/to/.runtime/swoole_phpy_latest' ...
[√] PHPy installation completed successfully.Confirm the extension is loaded:
# php -m | grep phpy
phpy # php --ri phpy
phpy support => enabled
Extension Version => 1.0.10
Python Version => 3.12.8 (main, Dec 7 2024, 05:56:13) [GCC 14.2.0]PyORC Python package
Install the PyORC module via the helper script: .vendor/bin/php-orc install:pyorc Sample output (includes pip upgrade and tzdata installation):
[>] Checking and installing Python PyORC ...
[>] Upgrading pip ...
[>] Installing TZData ...
[>] Installing PyORC-latest ...
[√] Python and PyORC installation complete.If pip is missing, install it first (Alpine example):
apk update
apk add py3-pipUsage Examples
Calling Python from PHP (Phpy)
File os.php demonstrates importing the Python os module and printing system information.
<?php
/**
* @desc Print operating system information
*/
declare(strict_types=1);
function main(): void {
$m = PyCore::import("os");
var_dump($m instanceof PyObject);
$rs = $m->uname();
echo $rs . PHP_EOL;
echo $rs->version . PHP_EOL;
}
main();Run the script:
# php os.php
bool(true)
posix.uname_result(sysname='Linux', nodename='b6fb15fcfed9', release='5.10.102.1-microsoft-standard-WSL2', version='#1 SMP Wed Mar 2 00:30:59 UTC 2022', machine='x86_64')
#1 SMP Wed Mar 2 00:30:59 UTC 2022Reading an ORC file
File reader.php opens an example ORC file, prints metadata, and iterates over rows.
<?php
/**
* @desc File read example
*/
declare(strict_types=1);
use function Workbunny\PhpOrc\open;
require_once __DIR__ . '/vendor/autoload.php';
$reader = new \Workbunny\PhpOrc\Reader(
open(__DIR__ . '/vendor/workbunny/php-orc/examples/example-php.orc', 'rb')
);
var_dump(
$reader->count(), // total rows
$reader->schema(), // schema string
$reader->compression(), // compression codec
$reader->userMetadata(), // custom metadata
$reader->writerId(), // writer identifier
$reader->writerVersion(), // writer version
$reader->softwareVersion(),// ORC library version
$reader->formatVersion() // file format version
);
$reader->iteration(function($i, $row) {
var_dump($i, $row);
});Typical output (truncated for brevity):
# php reader.php
int(3)
string(52) "struct<id:int,group:string,name:string,email:string>"
int(0)
array(...metadata...)
"ORC_CPP_WRITER"
int(6)
"ORC C++ 2.1.0"
... rows ...Common Issues
Missing pip : The installer may abort with sh: pip: not found. Install pip (e.g., apk add py3-pip on Alpine) before re‑running the PyORC install step.
Permission warnings : Running the helper as root can trigger pip warnings about permissions. Using a virtual environment or passing --root-user-action=ignore suppresses the warning.
With the library, Phpy extension, and PyORC installed, PHP developers can read, write, and manipulate ORC files directly from PHP code while leveraging the full Python data‑processing ecosystem.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Tech Hub
Sharing cutting-edge internet technologies and practical AI resources.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
