Route Easy Requests to Cheap Models with a PHP LLM Classifier
The article explains how to use the neuron-core/llm-classifier PHP package to define a difficulty score for prompts, calibrate it offline, and then route simple queries to inexpensive LLMs while sending hard queries to powerful models, all without added latency or cost.
Why routing based on difficulty matters
Developers often try to route difficult requests to a strong model and simple requests to a cheap model. Common hacks use prompt length or keyword lists, which either mismeasure difficulty or require constant manual updates.
LLM‑based difficulty scoring
The neuron-core/llm-classifier package builds a small classifier that reads an incoming prompt and returns a score between 0 (easy) and 1 (hard). The score is learned from the actual models registered in the fleet, so it reflects the difficulty as perceived by those models.
The classifier runs in pure PHP, requiring only ext‑mbstring. No Python side‑car, GPU, or separate inference server is needed. Scoring occurs in microseconds before any provider socket is opened and incurs no per‑request cost.
Two‑stage workflow
Calibration : Run the classifier offline once (via a script or console command) to teach it what is easy or hard for the target task. The output is a single model.bin file that can be versioned with the code.
Scoring : Load model.bin at application bootstrap (or inside Octane, RoadRunner, FrankenPHP). Each request calls overall() to obtain a difficulty number. The implementation uses the maximum of several capability scores rather than an average, deliberately treating any hard aspect as hard.
Training data and fastText vectors
The package ships with a ready‑to‑use dataset derived from the public RouterBench benchmark (≈1,845 prompts with pre‑computed difficulty labels). Training uses a free fastText word‑vector dictionary that maps each token to a 300‑dimensional vector; the prompt is reduced to the average of its token vectors, which becomes the classifier’s sole input. composer require neuron-core/llm-classifier To train the first classifier:
# 1) Download fastText vectors
curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.vec.gz
gunzip cc.en.300.vec.gz
mv cc.en.300.vec storage/
# 2) Run calibration script
php script/routerbench.phpAfter calibration, storage/model.bin contains the trained model.
Using the classifier in a router
Load the classifier and create a DifficultyRule that wraps the score and maps thresholds to providers:
use NeuronAI\Router\Rules\DifficultyRule;
use NeuronCore\Classifier\Classifier;
class MyAgent extends Agent {
protected function provider(): AIProviderInterface {
// Load classifier once at bootstrap
$scorer = Classifier::load('storage/model.bin');
return RouterProvider::make()
->addProvider('mini', new OpenAI(key: 'OPENAI_API_KEY', model: 'gpt-4o-mini'))
->addProvider('4o', new OpenAI(key: 'OPENAI_API_KEY', model: 'gpt-4o'))
->addProvider('o1', new OpenAI(key: 'OPENAI_API_KEY', model: 'o1'))
->setRule(
(new DifficultyRule($scorer))
->outOfDomain('o1', coverage: 0.4) // unfamiliar prompts → strongest
->easy('mini', maxScore: 0.33) // <0.33 → cheap fast model
->medium('4o', maxScore: 0.70) // <0.70 → balanced model
->hard('o1') // otherwise → most capable
);
}
}Tuning knobs
Two thresholds control routing: the difficulty cut‑offs (e.g., 0.33 and 0.70) and the coverage cut‑off (e.g., 0.4). Adjust them by logging real traffic—recording difficulty scores, coverage decisions, and the selected provider—until the balance between cost and correctness meets expectations. Lower the difficulty threshold if cheap models start failing; raise the coverage threshold if out‑of‑domain prompts leak to cheap providers.
Conclusion
Previously, PHP applications chose a model via static selection or brittle string matching. With the measured, microsecond‑level difficulty classifier, a data‑driven answer is obtained that keeps quality where it matters, reduces cost elsewhere, and adds no runtime latency.
The package neuron-core/llm-classifier is MIT‑licensed, includes the RouterBench dataset, and can be up and running in minutes: https://github.com/neuron-core/llm-classifier
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Tech Hub
Sharing cutting-edge internet technologies and practical AI resources.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
