Artificial Intelligence 10 min read

Running Large Language Models Locally Is Now Surprisingly Easy

The article explains how recent advances in LLM performance now allow developers to run sophisticated AI models locally on a 2022 M2 Mac using tools like LM Studio, Pi agent, and Docker, detailing model choices, setup steps, performance observations, and remaining limitations.

Machine Heart

Jun 24, 2026

Running Large Language Models Locally Is Now Surprisingly Easy

While the community has been focused on flagship large models, recent breakthroughs have made local AI model execution practical. Over the past six months, many new models have achieved significant gains in intelligence, agent capabilities, and toolchain maturity.

The author, Vicki Boykis, a machine‑learning engineer with experience at Mozilla.ai, Duo, Tumblr, Automattic, and Comcast, runs experiments on a 2022 M2 Mac equipped with 64 GB RAM and 1 TB storage. The models she evaluates include Mistral 7B, Gemma 3, OpenAI OSS‑20B, Qwen 3 MOE and other Qwen variants.

Historically, local models were slow and inaccurate, but the release of OpenAI GPT‑OSS changed that perception. Boykis judges a model’s adequacy by whether she still needs to compare its output with an API model; GPT‑OSS was the first to reduce that need dramatically. The newer Gemma 4 series enables on‑device agent coding with roughly 75 % of the performance of cutting‑edge models.

Using gemma‑4‑12b‑qat via LM Studio as her default local model, she has refactored a Python notebook into a multi‑module repository, performed code checks for correct generic type hints, proofread blog posts, written unit tests, and built a dual‑tower recommendation‑system repository. Sample outputs from these tasks are shown in the accompanying screenshots.

She also lists the inference back‑ends she has tried: the original llama.cpp from Open WebUI, llama‑cpp‑python, Ollama, llamafiles, and LM Studio. For agent execution she uses Pi as the framework and LM Studio as the inference server, noting that a direct llama.cpp setup could be faster in future experiments.

To run agents locally, Boykis configures Pi’s models.json to point to the local inference endpoint provided by LM Studio. The JSON snippet below shows the required configuration:

{
  "lmstudio": {
    "baseUrl": "http://host.docker.internal:1234/v1",
    "api": "openai-completions",
    "apiKey": "not-needed",
    "models": [
      {
        "id": "google/gemma-4-12b-qat",
        "input": ["text", "image"]
      }
    ]
  }
}

She then provides a Docker‑Compose file and a Bash launch script to start the Pi container with the necessary volume mounts, environment variables, and optional sandbox mode. The Compose file defines the Pi service, image tag, host‑gateway mapping, and mounts for the workspace and Pi configuration. The launch script resolves the workspace path, constructs a sanitized container name, and executes docker compose with the appropriate arguments.

#!/usr/bin/env bash
# Pi — Start the containerized Pi agent.
SCRIPT_DIR="$(cd -- "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
WORKSPACE_DIR="${WORKSPACE:-$(pwd)}"
# ... (rest of the script as in the source) ...
exec "${cmd[@]}"

Boykis notes that local models still have drawbacks: inference can be slower, context windows are limited by hardware, and early versions suffered from mismatched prompt templates. However, these issues are typically resolved quickly, and she remains uncertain whether the setup is ready for production‑grade software development.

Despite the limitations, the local‑model ecosystem offers significant advantages. Developers can observe token flow in real time, adjust context windows, modify system prompts and quantization settings, compare different models, and experiment with test frameworks. The tools continue to improve, making the possibilities virtually endless.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker ai-development model inference local LLM LM Studio Gemma 4 Pi agent

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.