Artificial Intelligence 8 min read

Timber: The “Ollama” for Traditional Machine Learning Models

Timber is a multi‑pass compiler that transforms classic ML models such as XGBoost and LightGBM into zero‑dependency C99 binaries, offering microsecond‑level inference latency, HTTP‑compatible serving, and substantial performance gains over Python runtimes, making it ideal for high‑throughput, low‑latency production scenarios.

Old Zhang's AI Learning

Mar 5, 2026

Timber: The “Ollama” for Traditional Machine Learning Models

Overview

Timber is a multi‑pass optimization compiler for classic machine‑learning models such as XGBoost, LightGBM, scikit‑learn, CatBoost, and ONNX. Its workflow reads a model file, applies IR optimizations (dead‑leaf elimination, quantization, branch sorting), generates zero‑dependency C99 inference code, compiles it into a shared library, and serves the model through a built‑in HTTP server compatible with the Ollama API.

Core workflow

Read the model file.

Apply IR optimizations (dead‑leaf elimination, quantization, branch sorting).

Generate dependency‑free C99 inference code.

Compile the code into a shared library.

Expose the model via the built‑in HTTP server (Ollama‑compatible API).

Model size example

A 50‑tree XGBoost model compiles to a 47.9 KB binary with no runtime dependencies, orders of magnitude smaller than a typical Python environment.

Benchmark (Apple M2 Pro, XGBoost binary classification, 50 trees)

Timber (native C) : ~2 µs latency per sample, ~500 000 predictions / sec, 336× speedup over Python XGBoost.

ONNX Runtime : ~80–150 µs latency, ~10 000 predictions / sec, ~5× speedup.

Treelite : ~10–30 µs latency, ~50 000 predictions / sec, ~20× speedup.

Python XGBoost : ~670 µs latency, ~1 500 predictions / sec, baseline.

Two‑microsecond inference translates to roughly 500 k predictions per second, suitable for sub‑millisecond decision‑making workloads.

Installation

pip install timber-compiler

The system must have gcc or clang installed; a recent Python version is recommended.

Usage

Timber provides two serving modes.

Mode 1: Serve a remote model directly

timber serve https://yourhost.com/models/fraud_model.json

This single command downloads, compiles, and starts the service without storing the model locally.

Mode 2: Load locally then serve

# Load and compile
 timber load fraud_model.json --name fraud-detector
# Start the service
 timber serve fraud-detector

The service listens on http://localhost:11434. Example request:

curl -s http://localhost:11434/api/predict \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fraud-detector",
    "inputs": [[1.2, 0.4, 3.1, 0.9]]
  }'

Sample response:

{
  "model": "fraud-detector",
  "outputs": [[0.031]],
  "latency_us": 1.8
}

Additional commands: timber list, timber inspect, timber bench, timber validate.

Supported model formats

XGBoost : .json (all objectives, multi‑class, binary, regression).

LightGBM : .txt, .model, .lgb (including multi‑class).

scikit‑learn : .pkl, .pickle (GradientBoosting, RandomForest, DecisionTree, Pipeline).

ONNX : .onnx (TreeEnsemble, Linear, SVM, Normalizer, Scaler).

CatBoost : .json (requires JSON export).

Performance details

The benchmark uses the sklearn breast‑cancer dataset (XGBoost binary, 50 trees, 30 features) on an Apple M2 Pro. Pure inference latency is ~2 µs, a 336× speedup over Python XGBoost. End‑to‑end latency adds HTTP round‑trip overhead (≈50–200 µs). The benchmark is an in‑process test; real‑world latency may vary with network stack and concurrency.

Applicable scenarios

Fraud & risk control – sub‑millisecond response for real‑time transaction decisions.

Edge / IoT deployment – run models on gateways, micro‑controllers, or ARM Cortex‑M devices.

Highly regulated industries – finance, healthcare, automotive require deterministic, auditable inference.

Infrastructure teams – eliminate Python from critical inference paths.

Limitations and considerations

ONNX support is limited to tree, linear, and SVM models; neural‑network layers are not supported.

CatBoost requires JSON export; native binary format is unsupported.

XGBoost accepts only JSON format; older binary boosters are not supported.

Generating LLVM IR requires a local LLVM installation.

Deep‑learning models (PyTorch, TensorFlow) are out of scope; Timber focuses on traditional ML.

Project repository

https://github.com/kossisoroyce/timber

model deployment XGBoost LightGBM high-performance inference ML compiler Timber

Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Overview

Core workflow

Model size example

Benchmark (Apple M2 Pro, XGBoost binary classification, 50 trees)

Installation

Usage

Supported model formats

Performance details

Applicable scenarios

Limitations and considerations

Project repository

Old Zhang's AI Learning

How this landed with the community

Was this worth your time?

0 Comments

Benchmark (Apple M2 Pro, XGBoost binary classification, 50 trees)