Auto‑Detect Which LLMs Your PC Can Run and Launch a Coding Agent
This article shows how the HF‑agent plugin uses llmfit to analyze your hardware, recommends runnable large language models, starts a llama.cpp server, and automatically launches the Pi coding agent, with step‑by‑step commands and a real‑world test on an M2 MacBook Air.
The HF‑agent plugin combines three tools—llmfit, llama.cpp, and Pi—to automatically detect a computer's hardware capabilities, recommend suitable large language models (LLMs), and launch a local coding agent.
The three referenced tools are:
llmfit: a one‑click hardware detection and local deployment utility.
llama.cpp: enables quantized model inference; a recent benchmark shows a single RTX 4090 running the Claude‑Opus‑4.6 distilled Qwen 3.5 27B model at 46 tokens per second.
Pi: an agent framework with a design philosophy distinct from Claude Code and OpenCode.
Installation is straightforward:
curl -LsSf https://hf.co/cli/install.sh | bash
hf extensions install hf-agentsTypical usage commands include:
hf agents fit recommend -n 5 # show top 5 compatible models
hf agents fit system # display hardware specs
hf agents fit search "qwen" # search for models containing "qwen"
hf agents fit recommend --use-case coding --min-fit goodOn a MacBook Air with an Apple M2 CPU (8 cores) and 8 GB RAM, the command hf agents fit system reports:
=== System Specifications ===
CPU: Apple M2 (8 cores)
Total RAM: 8.00 GB
Available RAM: 1.50 GB
Backend: Metal
GPU: Apple M2 (unified memory, 8.00 GB shared, Metal)Running
hf agents fit recommend --use-case coding --min-fit good -n 5 --jsonreturns the following top recommendations:
bigcode/starcoder2-7b Qwen/Qwen2.5-Coder-3B-Instruct Qwen/Qwen2.5-Coder-3BBecause the script notes that on Apple Silicon llmfit prefers MLX results, the M2 test indeed lists MLX‑based models first. The plugin maps MLX quantization formats (mlx‑8bit, mlx‑4bit, mlx‑3bit) to GGUF names (Q8_0, Q4_K_M, Q3_K_M) for compatibility with llama.cpp.
After detection, the recommended model can be launched with a single command: hf agents run pi The author also suggests trying the Pi coding agent directly, and notes that the Ollama CLI now includes built‑in support for Pi, allowing a one‑click start of the Kimi‑K2.5 cloud model without additional configuration.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
