Run Vision Transformer in PHP with phpy: A Complete Step‑by‑Step Guide

This article explains how to implement and run a Vision Transformer (ViT) model in PHP using the phpy extension, covering ViT fundamentals, installation of Python dependencies, full PHP and Python code examples, and practical application scenarios for PHP developers.

Open Source Tech Hub
Open Source Tech Hub
Open Source Tech Hub
Run Vision Transformer in PHP with phpy: A Complete Step‑by‑Step Guide

Background

Vision Transformer (ViT) has become popular in deep learning for its strong performance on image classification tasks. While most implementations are written in Python with PyTorch, PHP developers can also run ViT by leveraging the phpy extension, which enables PHP to call Python modules directly.

ViT Model Characteristics

Input images are split into patches, each patch is embedded into a 1‑D vector via Patch Embedding.

The core of the model is a Transformer Encoder block with Multi‑head Attention; the normalization layer position is adjusted.

After stacking several encoder blocks, a fully‑connected head produces class predictions. The encoder part is referred to as the backbone.

What is phpy?

phpy

is a PHP extension that allows PHP code to import and use Python modules. By using PyCore::import(), PHP can access libraries such as torch and torch.nn, making it possible to run complex deep‑learning models without leaving the PHP environment.

Installation

First install the required Python packages, for example: pip install torch A typical installation log (truncated) looks like:

Collecting torch
  Downloading torch-2.4.0‑cp39‑cp39‑manylinux1_x86_64.whl (797.2 MB)
Collecting nvidia‑cufft‑cu12==11.0.2.54
  Downloading nvidia_cufft_cu12‑11.0.2.54‑py3‑none‑manylinux1_x86_64.whl (121.6 MB)
... (additional dependencies) ...
Successfully installed torch‑2.4.0 filelock‑3.15.4 fsspec‑2024.6.1 ...

PHP Implementation

<?php
declare(strict_types=1);
/**
 * ViT class defines the Vision Transformer structure.
 */
class Vit {
    private mixed $emb_size;
    private int $patch_size;
    private int $patch_count;
    private $conv;
    private $patch_emb;
    private $cls_token;
    private $pos_emb;
    private $tranformer_enc;
    private $cls_linear;
    private $torch; // imported torch module
    private $nn;    // imported torch.nn module

    /**
     * Constructor initializes model parameters and layers.
     * @param int $emb_size Embedding size, default 16.
     */
    public function __construct($emb_size = 16) {
        $this->torch = PyCore::import('torch');
        $this->nn    = PyCore::import('torch.nn');
        $this->emb_size   = $emb_size;
        $this->patch_size = 4;
        $this->patch_count = intdiv(28, $this->patch_size);
        $this->conv = $this->nn->Conv2d(
            in_channels: 1,
            out_channels: pow($this->patch_size, 2),
            kernel_size: $this->patch_size,
            padding: 0,
            stride: $this->patch_size,
        );
        $this->patch_emb = $this->nn->Linear(pow($this->patch_size, 2), $this->emb_size);
        $this->cls_token = $this->torch->randn([1, 1, $this->emb_size]);
        $this->pos_emb = $this->torch->randn([1, pow($this->patch_count, 2) + 1, $this->emb_size]);
        $encoder_layer = $this->nn->TransformerEncoderLayer(
            $this->emb_size, 2,
            dim_feedforward: 2 * $this->emb_size,
            dropout: 0.1,
            activation: 'relu',
            layer_norm_eps: 1e-5,
            batch_first: true
        );
        $this->tranformer_enc = $this->nn->TransformerEncoder($encoder_layer, 3);
        $this->cls_linear = $this->nn->Linear($this->emb_size, 10);
    }

    /**
     * Forward pass of the model.
     * @param mixed $x Input tensor.
     * @return mixed Model output.
     */
    public function forward($x) {
        $operator = \PyCore::import('operator');
        $x = $this->conv->forward($x);
        $batch_size = $x->size(0);
        $out_channels = $x->size(1);
        $height = $x->size(2);
        $width = $x->size(3);
        $x = $x->view($batch_size, $out_channels, $height * $width);
        $x = $x->permute([0, 2, 1]);
        $x = $this->patch_emb->forward($x);
        $cls_token = $this->cls_token->expand([$x->size(0), 1, $x->size(2)]);
        $x = $this->torch->cat([$cls_token, $x], 1);
        $x = $operator->__add__($x, $this->pos_emb);
        $x = $this->tranformer_enc->forward($x);
        return $this->cls_linear->forward($x->select(1, 0));
    }
}

// Import torch library
$torch = PyCore::import('torch');
// Initialize ViT model
$vit = new Vit();
// Create a random input tensor (5, 1, 28, 28)
$x = $torch->rand(5, 1, 28, 28);
// Forward pass
$y = $vit->forward($x);
// Print result
PyCore::print($y);

Running the PHP Code

# php ViT.php
tensor([[ 1.4124e-01, -2.2445e-01, -4.8343e-02,  1.0453e+00,  2.6407e-01,
         -1.0721e+00, -4.5355e-01,  9.3695e-01,  2.0814e-01, -6.9242e-01],
        [ 1.3197e-01, -1.7860e-01, -3.5619e-02,  1.0052e+00,  3.5701e-01,
         -1.0619e+00, -5.5952e-01,  8.9957e-01,  2.2079e-01, -7.3373e-01],
        ...])

Python Reference Implementation

from torch import nn
import torch

class ViT(nn.Module):
    def __init__(self, emb_size=16):
        super().__init__()
        self.patch_size = 4
        self.patch_count = 28 // self.patch_size
        self.conv = nn.Conv2d(in_channels=1, out_channels=self.patch_size**2,
                              kernel_size=self.patch_size, padding=0, stride=self.patch_size)
        self.patch_emb = nn.Linear(in_features=self.patch_size**2, out_features=emb_size)
        self.cls_token = nn.Parameter(torch.rand(1, 1, emb_size))
        self.pos_emb = nn.Parameter(torch.rand(1, self.patch_count**2 + 1, emb_size))
        self.tranformer_enc = nn.TransformerEncoder(
            nn.TransformerEncoderLayer(d_model=emb_size, nhead=2, batch_first=True),
            num_layers=3)
        self.cls_linear = nn.Linear(in_features=emb_size, out_features=10)

    def forward(self, x):
        x = self.conv(x)
        x = x.view(x.size(0), x.size(1), self.patch_count**2)
        x = x.permute(0, 2, 1)
        x = self.patch_emb(x)
        cls_token = self.cls_token.expand(x.size(0), 1, x.size(2))
        x = torch.cat((cls_token, x), dim=1)
        x = self.pos_emb + x
        y = self.tranformer_enc(x)
        return self.cls_linear(y[:, 0, :])

if __name__ == '__main__':
    vit = ViT()
    x = torch.rand(5, 1, 28, 28)
    y = vit(x)
    print(y.shape)

Application Scenarios and Significance

PHP is widely used for web development but lacks native deep‑learning support. By using phpy, developers can directly call Python frameworks such as PyTorch or TensorFlow, integrating sophisticated AI algorithms into PHP applications—for example, real‑time prediction services or complex data‑processing pipelines.

Conclusion

With the phpy extension, PHP developers can effortlessly run Python‑based deep‑learning models like Vision Transformer, expanding PHP’s capabilities and opening new possibilities for AI‑enhanced web applications. As AI continues to evolve, the synergy between PHP and Python is expected to foster further innovation.

Source: https://segmentfault.com/a/1190000045240156
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIDeep LearningPHPPyTorchVision Transformerphpypyth
Open Source Tech Hub
Written by

Open Source Tech Hub

Sharing cutting-edge internet technologies and practical AI resources.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.