Artificial Intelligence 9 min read

Run AI Models Locally with Docker Model Runner and Java Integration

This article explains how Docker Model Runner enables effortless local execution of AI models, details platform support, provides a full command reference, shows how to use the REST endpoint, and demonstrates integration with Java via LangChain4j, including code examples and a feature comparison with Ollama.

Java Architecture Diary
Java Architecture Diary
Java Architecture Diary
Run AI Models Locally with Docker Model Runner and Java Integration

Docker introduced the Model Runner feature in version 4.40, making it simple to run AI models locally without complex environment setup.

Current platform support: Docker Model Runner is available on Apple Silicon (M‑series) Macs, with Windows support planned for future releases.

The feature marks a significant step for Docker into AI development, allowing developers to manage and run large language models locally and avoid reliance on external cloud services.

Available Commands

Check Model Runner Status

Check whether Docker Model Runner is active:

<code>docker model status</code>

List All Commands

Show help information and available sub‑commands:

<code>docker model help</code>

Output:

<code>Usage:  docker model COMMAND

Commands:
  list        List locally available models
  pull        Download a model from Docker Hub
  rm          Remove a downloaded model
  run         Run a model interactively or with a prompt
  status      Check if the model runner is running
  version     Show the current version</code>

Pull a Model

Pull Model
Pull Model

Pull a model from Docker Hub to the local environment:

<code>docker model pull &lt;model&gt;</code>

Example:

<code>docker model pull ai/deepseek-r1-distill-llama</code>

Output:

<code>Downloaded: 257.71 MB
Model ai/deepseek-r1-distill-llama pulled successfully</code>

List Available Models

List all models currently pulled to the local environment:

<code>docker model list</code>

Sample output:

<code>MODEL       PARAMETERS  QUANTIZATION   ARCHITECTURE  MODEL ID       CREATED    SIZE
ai/deepseek-r1-distill-llama  361.82 M   IQ2_XXS/Q4_K_M  llama   354bf30d0aa3  1 days ago  256.35 MiB</code>

Run a Model

Run a model with a single prompt or in interactive chat mode.

Single Prompt

<code>docker model run ai/deepseek-r1-distill-llama "Hi"</code>

Output:

<code>Hello! How can I assist you today?</code>

Interactive Chat

<code>docker model run ai/deepseek-r1-distill-llama</code>

Output:

<code>Interactive chat mode started. Type '/bye' to exit.
> Hi
Hi there! It's SmolLM, AI assistant. How can I help you today?
> /bye
Chat session ended.</code>

Delete a Model

<code>docker model rm &lt;model&gt;</code>

Output:

<code>Model &lt;model&gt; removed successfully</code>

Using the REST Endpoint

Enable host‑side TCP support in Docker Desktop GUI or CLI:

<code>docker desktop enable model-runner --tcp &lt;port&gt;</code>

Then interact via the chosen port, for example:

<code>curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/deepseek-r1-distill-llama",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Please write a summary about Docker."}
        ]
    }'</code>

LangChain4j Integration

LangChain4j is a Java framework for building applications powered by large language models (LLMs), offering a simple way for Java developers to interact with various LLMs.

Setup Steps

1. Ensure Docker Model Runner Is Enabled

Make sure the Model Runner feature is turned on in Docker Desktop.

2. Add LangChain4j Dependency

Add the following dependencies to your

pom.xml

:

<code>&lt;dependencies&gt;
    &lt;dependency&gt;
        &lt;groupId&gt;dev.langchain4j&lt;/groupId&gt;
        &lt;artifactId&gt;langchain4j&lt;/artifactId&gt;
        &lt;version&gt;1.0.0-beta2&lt;/version&gt;
    &lt;/dependency&gt;
    &lt;dependency&gt;
        &lt;groupId&gt;dev.langchain4j&lt;/groupId&gt;
        &lt;artifactId&gt;langchain4j-open-ai&lt;/artifactId&gt;
        &lt;version&gt;1.0.0-beta2&lt;/version&gt;
    &lt;/dependency&gt;
&lt;/dependencies&gt;</code>

3. Pull and Run the Desired Model

<code>docker model pull ai/deepseek-r1-distill-llama</code>

4. Configure LangChain4j to Connect to the Local Model

<code>import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.openai.OpenAiChatModel;

public class ModelConfig {
    public ChatLanguageModel chatLanguageModel() {
        return OpenAiChatModel.builder()
                .baseUrl("http://localhost:12434/engines/llama.cpp/v1")
                .modelName("ai/deepseek-r1-distill-llama")
                .temperature(0.7)
                .build();
    }
}</code>

Sample Application

<code>public class DockerModelExample {
    interface Assistant {
        String chat(String message);
    }
    public static void main(String[] args) {
        ModelConfig config = new ModelConfig();
        ChatLanguageModel model = config.chatLanguageModel();
        Assistant assistant = AiServices.builder(Assistant.class)
                .chatLanguageModel(model)
                .build();
        String response = assistant.chat("用 Java 编写一个简单的 Hello World 程序");
        System.out.println(response);
    }
}</code>

Summary

Docker Model Runner and Ollama both aim to simplify local AI model execution, but Docker Model Runner is tightly integrated with the Docker ecosystem, while Ollama is a standalone, cross‑platform tool with broader language support and more flexible model customization.

JavaDockerAIREST APILangChain4jModel Runner
Java Architecture Diary
Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.