Timber: The “Ollama” for Traditional Machine Learning Models
Timber is a multi‑pass compiler that transforms classic ML models such as XGBoost and LightGBM into zero‑dependency C99 binaries, offering microsecond‑level inference latency, HTTP‑compatible serving, and substantial performance gains over Python runtimes, making it ideal for high‑throughput, low‑latency production scenarios.
Overview
Timber is a multi‑pass optimization compiler for classic machine‑learning models such as XGBoost, LightGBM, scikit‑learn, CatBoost, and ONNX. Its workflow reads a model file, applies IR optimizations (dead‑leaf elimination, quantization, branch sorting), generates zero‑dependency C99 inference code, compiles it into a shared library, and serves the model through a built‑in HTTP server compatible with the Ollama API.
Core workflow
Read the model file.
Apply IR optimizations (dead‑leaf elimination, quantization, branch sorting).
Generate dependency‑free C99 inference code.
Compile the code into a shared library.
Expose the model via the built‑in HTTP server (Ollama‑compatible API).
Model size example
A 50‑tree XGBoost model compiles to a 47.9 KB binary with no runtime dependencies, orders of magnitude smaller than a typical Python environment.
Benchmark (Apple M2 Pro, XGBoost binary classification, 50 trees)
Timber (native C) : ~2 µs latency per sample, ~500 000 predictions / sec, 336× speedup over Python XGBoost.
ONNX Runtime : ~80–150 µs latency, ~10 000 predictions / sec, ~5× speedup.
Treelite : ~10–30 µs latency, ~50 000 predictions / sec, ~20× speedup.
Python XGBoost : ~670 µs latency, ~1 500 predictions / sec, baseline.
Two‑microsecond inference translates to roughly 500 k predictions per second, suitable for sub‑millisecond decision‑making workloads.
Installation
pip install timber-compilerThe system must have gcc or clang installed; a recent Python version is recommended.
Usage
Timber provides two serving modes.
Mode 1: Serve a remote model directly
timber serve https://yourhost.com/models/fraud_model.jsonThis single command downloads, compiles, and starts the service without storing the model locally.
Mode 2: Load locally then serve
# Load and compile
timber load fraud_model.json --name fraud-detector
# Start the service
timber serve fraud-detectorThe service listens on http://localhost:11434. Example request:
curl -s http://localhost:11434/api/predict \
-H "Content-Type: application/json" \
-d '{
"model": "fraud-detector",
"inputs": [[1.2, 0.4, 3.1, 0.9]]
}'Sample response:
{
"model": "fraud-detector",
"outputs": [[0.031]],
"latency_us": 1.8
}Additional commands: timber list, timber inspect, timber bench, timber validate.
Supported model formats
XGBoost : .json (all objectives, multi‑class, binary, regression).
LightGBM : .txt, .model, .lgb (including multi‑class).
scikit‑learn : .pkl, .pickle (GradientBoosting, RandomForest, DecisionTree, Pipeline).
ONNX : .onnx (TreeEnsemble, Linear, SVM, Normalizer, Scaler).
CatBoost : .json (requires JSON export).
Performance details
The benchmark uses the sklearn breast‑cancer dataset (XGBoost binary, 50 trees, 30 features) on an Apple M2 Pro. Pure inference latency is ~2 µs, a 336× speedup over Python XGBoost. End‑to‑end latency adds HTTP round‑trip overhead (≈50–200 µs). The benchmark is an in‑process test; real‑world latency may vary with network stack and concurrency.
Applicable scenarios
Fraud & risk control – sub‑millisecond response for real‑time transaction decisions.
Edge / IoT deployment – run models on gateways, micro‑controllers, or ARM Cortex‑M devices.
Highly regulated industries – finance, healthcare, automotive require deterministic, auditable inference.
Infrastructure teams – eliminate Python from critical inference paths.
Limitations and considerations
ONNX support is limited to tree, linear, and SVM models; neural‑network layers are not supported.
CatBoost requires JSON export; native binary format is unsupported.
XGBoost accepts only JSON format; older binary boosters are not supported.
Generating LLVM IR requires a local LLVM installation.
Deep‑learning models (PyTorch, TensorFlow) are out of scope; Timber focuses on traditional ML.
Project repository
https://github.com/kossisoroyce/timber
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
