Route Easy Requests to Cheap Models with a PHP LLM Classifier

The article explains how to use the neuron-core/llm-classifier PHP package to define a difficulty score for prompts, calibrate it offline, and then route simple queries to inexpensive LLMs while sending hard queries to powerful models, all without added latency or cost.

Open Source Tech Hub
Open Source Tech Hub
Open Source Tech Hub
Route Easy Requests to Cheap Models with a PHP LLM Classifier

Why routing based on difficulty matters

Developers often try to route difficult requests to a strong model and simple requests to a cheap model. Common hacks use prompt length or keyword lists, which either mismeasure difficulty or require constant manual updates.

LLM‑based difficulty scoring

The neuron-core/llm-classifier package builds a small classifier that reads an incoming prompt and returns a score between 0 (easy) and 1 (hard). The score is learned from the actual models registered in the fleet, so it reflects the difficulty as perceived by those models.

The classifier runs in pure PHP, requiring only ext‑mbstring. No Python side‑car, GPU, or separate inference server is needed. Scoring occurs in microseconds before any provider socket is opened and incurs no per‑request cost.

Two‑stage workflow

Calibration : Run the classifier offline once (via a script or console command) to teach it what is easy or hard for the target task. The output is a single model.bin file that can be versioned with the code.

Scoring : Load model.bin at application bootstrap (or inside Octane, RoadRunner, FrankenPHP). Each request calls overall() to obtain a difficulty number. The implementation uses the maximum of several capability scores rather than an average, deliberately treating any hard aspect as hard.

Training data and fastText vectors

The package ships with a ready‑to‑use dataset derived from the public RouterBench benchmark (≈1,845 prompts with pre‑computed difficulty labels). Training uses a free fastText word‑vector dictionary that maps each token to a 300‑dimensional vector; the prompt is reduced to the average of its token vectors, which becomes the classifier’s sole input. composer require neuron-core/llm-classifier To train the first classifier:

# 1) Download fastText vectors
curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.vec.gz
gunzip cc.en.300.vec.gz
mv cc.en.300.vec storage/

# 2) Run calibration script
php script/routerbench.php

After calibration, storage/model.bin contains the trained model.

Using the classifier in a router

Load the classifier and create a DifficultyRule that wraps the score and maps thresholds to providers:

use NeuronAI\Router\Rules\DifficultyRule;
use NeuronCore\Classifier\Classifier;

class MyAgent extends Agent {
    protected function provider(): AIProviderInterface {
        // Load classifier once at bootstrap
        $scorer = Classifier::load('storage/model.bin');
        return RouterProvider::make()
            ->addProvider('mini', new OpenAI(key: 'OPENAI_API_KEY', model: 'gpt-4o-mini'))
            ->addProvider('4o',   new OpenAI(key: 'OPENAI_API_KEY', model: 'gpt-4o'))
            ->addProvider('o1',   new OpenAI(key: 'OPENAI_API_KEY', model: 'o1'))
            ->setRule(
                (new DifficultyRule($scorer))
                    ->outOfDomain('o1', coverage: 0.4)   // unfamiliar prompts → strongest
                    ->easy('mini', maxScore: 0.33)      // <0.33 → cheap fast model
                    ->medium('4o', maxScore: 0.70)      // <0.70 → balanced model
                    ->hard('o1')                         // otherwise → most capable
            );
    }
}

Tuning knobs

Two thresholds control routing: the difficulty cut‑offs (e.g., 0.33 and 0.70) and the coverage cut‑off (e.g., 0.4). Adjust them by logging real traffic—recording difficulty scores, coverage decisions, and the selected provider—until the balance between cost and correctness meets expectations. Lower the difficulty threshold if cheap models start failing; raise the coverage threshold if out‑of‑domain prompts leak to cheap providers.

Conclusion

Previously, PHP applications chose a model via static selection or brittle string matching. With the measured, microsecond‑level difficulty classifier, a data‑driven answer is obtained that keeps quality where it matters, reduces cost elsewhere, and adds no runtime latency.

The package neuron-core/llm-classifier is MIT‑licensed, includes the RouterBench dataset, and can be up and running in minutes: https://github.com/neuron-core/llm-classifier

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMRoutingPHPmodel selectionclassifierfastTextneuron-core
Open Source Tech Hub
Written by

Open Source Tech Hub

Sharing cutting-edge internet technologies and practical AI resources.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.