Artificial Intelligence 9 min read

Run AI Models Locally with Docker Model Runner and Java Integration

This article explains how Docker Model Runner enables effortless local execution of AI models, details platform support, provides a full command reference, shows how to use the REST endpoint, and demonstrates integration with Java via LangChain4j, including code examples and a feature comparison with Ollama.

Java Architecture Diary

Apr 2, 2025

Run AI Models Locally with Docker Model Runner and Java Integration

Docker introduced the Model Runner feature in version 4.40, making it simple to run AI models locally without complex environment setup.

Current platform support: Docker Model Runner is available on Apple Silicon (M‑series) Macs, with Windows support planned for future releases.

The feature marks a significant step for Docker into AI development, allowing developers to manage and run large language models locally and avoid reliance on external cloud services.

Available Commands

Check Model Runner Status

Check whether Docker Model Runner is active:

docker model status

List All Commands

Show help information and available sub‑commands: docker model help Output:

Usage:  docker model COMMAND

Commands:
  list        List locally available models
  pull        Download a model from Docker Hub
  rm          Remove a downloaded model
  run         Run a model interactively or with a prompt
  status      Check if the model runner is running
  version     Show the current version

Pull a Model

Pull a model from Docker Hub to the local environment: docker model pull <model> Example: docker model pull ai/deepseek-r1-distill-llama Output:

Downloaded: 257.71 MB
Model ai/deepseek-r1-distill-llama pulled successfully

List Available Models

List all models currently pulled to the local environment: docker model list Sample output:

MODEL       PARAMETERS  QUANTIZATION   ARCHITECTURE  MODEL ID       CREATED    SIZE
ai/deepseek-r1-distill-llama  361.82 M   IQ2_XXS/Q4_K_M  llama   354bf30d0aa3  1 days ago  256.35 MiB

Run a Model

Run a model with a single prompt or in interactive chat mode.

Single Prompt

docker model run ai/deepseek-r1-distill-llama "Hi"

Output:

Hello! How can I assist you today?

Interactive Chat

docker model run ai/deepseek-r1-distill-llama

Output:

Interactive chat mode started. Type '/bye' to exit.
> Hi
Hi there! It's SmolLM, AI assistant. How can I help you today?
> /bye
Chat session ended.

Delete a Model

docker model rm <model>

Output:

Model <model> removed successfully

Using the REST Endpoint

Enable host‑side TCP support in Docker Desktop GUI or CLI:

docker desktop enable model-runner --tcp <port>

Then interact via the chosen port, for example:

curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/deepseek-r1-distill-llama",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Please write a summary about Docker."}
        ]
    }'

LangChain4j Integration

LangChain4j is a Java framework for building applications powered by large language models (LLMs), offering a simple way for Java developers to interact with various LLMs.

Setup Steps

1. Ensure Docker Model Runner Is Enabled

Make sure the Model Runner feature is turned on in Docker Desktop.

2. Add LangChain4j Dependency

Add the following dependencies to your pom.xml:

<dependencies>
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j</artifactId>
        <version>1.0.0-beta2</version>
    </dependency>
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-open-ai</artifactId>
        <version>1.0.0-beta2</version>
    </dependency>
</dependencies>

3. Pull and Run the Desired Model

docker model pull ai/deepseek-r1-distill-llama

4. Configure LangChain4j to Connect to the Local Model

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.openai.OpenAiChatModel;

public class ModelConfig {
    public ChatLanguageModel chatLanguageModel() {
        return OpenAiChatModel.builder()
                .baseUrl("http://localhost:12434/engines/llama.cpp/v1")
                .modelName("ai/deepseek-r1-distill-llama")
                .temperature(0.7)
                .build();
    }
}

Sample Application

public class DockerModelExample {
    interface Assistant {
        String chat(String message);
    }
    public static void main(String[] args) {
        ModelConfig config = new ModelConfig();
        ChatLanguageModel model = config.chatLanguageModel();
        Assistant assistant = AiServices.builder(Assistant.class)
                .chatLanguageModel(model)
                .build();
        String response = assistant.chat("用 Java 编写一个简单的 Hello World 程序");
        System.out.println(response);
    }
}

Summary

Docker Model Runner and Ollama both aim to simplify local AI model execution, but Docker Model Runner is tightly integrated with the Docker ecosystem, while Ollama is a standalone, cross‑platform tool with broader language support and more flexible model customization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker AI REST API LangChain4j Model Runner

Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.