Artificial Intelligence 5 min read

Auto‑Detect Which LLMs Your PC Can Run and Launch a Coding Agent

This article shows how the HF‑agent plugin uses llmfit to analyze your hardware, recommends runnable large language models, starts a llama.cpp server, and automatically launches the Pi coding agent, with step‑by‑step commands and a real‑world test on an M2 MacBook Air.

Old Zhang's AI Learning

Mar 20, 2026

Auto‑Detect Which LLMs Your PC Can Run and Launch a Coding Agent

The HF‑agent plugin combines three tools—llmfit, llama.cpp, and Pi—to automatically detect a computer's hardware capabilities, recommend suitable large language models (LLMs), and launch a local coding agent.

The three referenced tools are:

llmfit: a one‑click hardware detection and local deployment utility.

llama.cpp: enables quantized model inference; a recent benchmark shows a single RTX 4090 running the Claude‑Opus‑4.6 distilled Qwen 3.5 27B model at 46 tokens per second.

Pi: an agent framework with a design philosophy distinct from Claude Code and OpenCode.

Installation is straightforward:

curl -LsSf https://hf.co/cli/install.sh | bash
hf extensions install hf-agents

Typical usage commands include:

hf agents fit recommend -n 5               # show top 5 compatible models
hf agents fit system                      # display hardware specs
hf agents fit search "qwen"               # search for models containing "qwen"
hf agents fit recommend --use-case coding --min-fit good

On a MacBook Air with an Apple M2 CPU (8 cores) and 8 GB RAM, the command hf agents fit system reports:

=== System Specifications ===
CPU: Apple M2 (8 cores)
Total RAM: 8.00 GB
Available RAM: 1.50 GB
Backend: Metal
GPU: Apple M2 (unified memory, 8.00 GB shared, Metal)

Running

hf agents fit recommend --use-case coding --min-fit good -n 5 --json

returns the following top recommendations:

bigcode/starcoder2-7b

Qwen/Qwen2.5-Coder-3B-Instruct

Qwen/Qwen2.5-Coder-3B

Because the script notes that on Apple Silicon llmfit prefers MLX results, the M2 test indeed lists MLX‑based models first. The plugin maps MLX quantization formats (mlx‑8bit, mlx‑4bit, mlx‑3bit) to GGUF names (Q8_0, Q4_K_M, Q3_K_M) for compatibility with llama.cpp.

After detection, the recommended model can be launched with a single command: hf agents run pi The author also suggests trying the Pi coding agent directly, and notes that the Ollama CLI now includes built‑in support for Pi, allowing a one‑click start of the Kimi‑K2.5 cloud model without additional configuration.

pi llama.cpp coding agent HF-agent llmfit model recommendation

Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.