One-Click Tool to Determine Which Large Language Models Your PC Can Run Locally

The llmfit command‑line utility scans your CPU, RAM, GPU and VRAM, scores 157 models from over 30 providers, suggests the highest‑quality quantized version that fits, integrates with Ollama, and shows real‑world test results confirming its accuracy, though its model database is limited.

Old Zhang's AI Learning
Old Zhang's AI Learning
Old Zhang's AI Learning
One-Click Tool to Determine Which Large Language Models Your PC Can Run Locally

Introduction

llmfit

is a terminal‑based tool that detects the host’s CPU, RAM, GPU, and VRAM, then matches against a built‑in database of 157 large language models from over 30 vendors. It reports which models can run, expected token‑per‑second speed, and the appropriate quantization level.

Core Features

One‑click hardware detection and intelligent scoring : Detects platforms including NVIDIA, AMD, and Apple Silicon; evaluates memory, quality, speed, compatibility, and long‑context capability, assigning a higher score to models expected to run more smoothly.

Automatic selection of the best quantization version : Probes from the highest‑quality quantization downward and selects the highest‑quality version that fits the detected hardware.

Mixture‑of‑Experts (MoE) handling : For models such as Mixtral or DeepSeek‑V3, only a subset of experts is activated at runtime; llmfit accounts for the actual active parameters, preventing false‑negative warnings.

Direct integration with Ollama : When Ollama is installed, the UI lists its models and can trigger Ollama to pull a selected model with a single command.

Installation

For macOS or Linux, run the install script:

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

Homebrew alternative:

brew tap AlexsJones/llmfit
brew install llmfit

Usage

Running llmfit opens the terminal UI. CLI commands include:

# List the top 5 models best suited for the machine
llmfit fit --perfect -n 5

# Show detected hardware details
llmfit system

# Search for Llama‑family models that fit
llmfit search "llama 8b"

If nvidia-smi cannot report VRAM, specify memory manually: llmfit --memory=24G fit --perfect -n 5 The UI can also trigger Ollama to download a model directly.

MoE Model Support

MoE model support
MoE model support

For mixture‑of‑experts models such as Mixtral and DeepSeek‑V3, only a fraction of the total parameters are active during inference; llmfit incorporates this fact into its resource estimation.

Real‑World Test

On a work laptop, the tool’s recommended quantization levels allowed the model to run smoothly, achieving a balance between hardware utilization and response speed. The approach removed the need to repeatedly download large weight files for trial‑and‑error testing. The only noted limitation is the current catalog of 157 models, which relies on community updates.

Interface Demo

Main UI demonstration
Main UI demonstration
Quantizationlarge language modelsMixture of ExpertsLocal DeploymentOllamahardware detectionllmfit
Old Zhang's AI Learning
Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.