One-Click Tool to Determine Which Large Language Models Your PC Can Run Locally
The llmfit command‑line utility scans your CPU, RAM, GPU and VRAM, scores 157 models from over 30 providers, suggests the highest‑quality quantized version that fits, integrates with Ollama, and shows real‑world test results confirming its accuracy, though its model database is limited.
Introduction
llmfitis a terminal‑based tool that detects the host’s CPU, RAM, GPU, and VRAM, then matches against a built‑in database of 157 large language models from over 30 vendors. It reports which models can run, expected token‑per‑second speed, and the appropriate quantization level.
Core Features
One‑click hardware detection and intelligent scoring : Detects platforms including NVIDIA, AMD, and Apple Silicon; evaluates memory, quality, speed, compatibility, and long‑context capability, assigning a higher score to models expected to run more smoothly.
Automatic selection of the best quantization version : Probes from the highest‑quality quantization downward and selects the highest‑quality version that fits the detected hardware.
Mixture‑of‑Experts (MoE) handling : For models such as Mixtral or DeepSeek‑V3, only a subset of experts is activated at runtime; llmfit accounts for the actual active parameters, preventing false‑negative warnings.
Direct integration with Ollama : When Ollama is installed, the UI lists its models and can trigger Ollama to pull a selected model with a single command.
Installation
For macOS or Linux, run the install script:
curl -fsSL https://llmfit.axjns.dev/install.sh | shHomebrew alternative:
brew tap AlexsJones/llmfit
brew install llmfitUsage
Running llmfit opens the terminal UI. CLI commands include:
# List the top 5 models best suited for the machine
llmfit fit --perfect -n 5
# Show detected hardware details
llmfit system
# Search for Llama‑family models that fit
llmfit search "llama 8b"If nvidia-smi cannot report VRAM, specify memory manually: llmfit --memory=24G fit --perfect -n 5 The UI can also trigger Ollama to download a model directly.
MoE Model Support
For mixture‑of‑experts models such as Mixtral and DeepSeek‑V3, only a fraction of the total parameters are active during inference; llmfit incorporates this fact into its resource estimation.
Real‑World Test
On a work laptop, the tool’s recommended quantization levels allowed the model to run smoothly, achieving a balance between hardware utilization and response speed. The approach removed the need to repeatedly download large weight files for trial‑and‑error testing. The only noted limitation is the current catalog of 157 models, which relies on community updates.
Interface Demo
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
