Artificial Intelligence 6 min read

Run Large Language Models Directly in Java with Jlama – Quick Start Guide

This article introduces Jlama, an open‑source Java LLM inference engine, outlines its key features, provides step‑by‑step CLI and Maven integration instructions, shows code examples, run logs, and special setup notes for using large language models efficiently within Java applications.

Java Architecture Diary
Java Architecture Diary
Java Architecture Diary
Run Large Language Models Directly in Java with Jlama – Quick Start Guide

What is Jlama?

Jlama is an open-source large language model inference engine built specifically for the Java ecosystem, allowing developers to run LLM inference directly inside Java applications without external services.

Key Features

Supports popular models such as Gemma, Llama, Mistral, Mixtral, Qwen2 and others.

Built on Java 20+ and leverages the Vector API for high-performance inference.

Quick Start

1. Command-line usage

<code># Install jbang
curl -Ls https://sh.jbang.dev | bash -s - app setup

# Install Jlama CLI
jbang app install --force jlama@tjake

# Run a model (Web UI enabled)
jlama restapi tjake/Llama-3.2-1B-Instruct-JQ4 --auto-download</code>

2. Java project integration

Add dependencies to pom.xml

<code>&lt;dependency&gt;
  &lt;groupId&gt;com.github.tjake&lt;/groupId&gt;
  &lt;artifactId&gt;jlama-core&lt;/artifactId&gt;
  &lt;version&gt;${jlama.version}&lt;/version&gt;
&lt;/dependency&gt;

&lt;dependency&gt;
  &lt;groupId&gt;com.github.tjake&lt;/groupId&gt;
  &lt;artifactId&gt;jlama-native&lt;/artifactId&gt;
  &lt;classifier&gt;${os.detected.name}-${os.detected.arch}&lt;/classifier&gt;
  &lt;version&gt;${jlama.version}&lt;/version&gt;
&lt;/dependency&gt;

&lt;!-- LangChain4j integration --&gt;
&lt;dependency&gt;
  &lt;groupId&gt;dev.langchain4j&lt;/groupId&gt;
  &lt;artifactId&gt;langchain4j&lt;/artifactId&gt;
  &lt;version&gt;1.0.0-beta1&lt;/version&gt;
&lt;/dependency&gt;

&lt;dependency&gt;
  &lt;groupId&gt;dev.langchain4j&lt;/groupId&gt;
  &lt;artifactId&gt;langchain4j-jlama&lt;/artifactId&gt;
  &lt;version&gt;1.0.0-beta1&lt;/version&gt;
&lt;/dependency&gt;</code>

Code example

<code>ChatLanguageModel model = JlamaChatModel.builder()
        .modelName("tjake/Llama-3.2-1B-Instruct-JQ4")
        .temperature(0.7f)
        .build();

ChatResponse response = model.chat(
        UserMessage.from("Help me write a java version of the palindromic algorithm")
);

System.out.println("\n" + response.aiMessage().text());</code>

Run log

<code>INFO  c.g.tjake.jlama.util.HttpSupport - Downloaded file: /Users/.../config.json
INFO  c.g.tjake.jlama.util.HttpSupport - Downloading file: /Users/.../model.safetensors
WARNING: Using incubator modules: jdk.incubator.vector
INFO  c.g.tjake.jlama.model.AbstractModel - Model type = Q4, Working memory type = F32, Quantized memory type = I8</code>
1740296961
1740296961

Special Notes

The engine downloads model files from Hugging Face at runtime (requires network access). If automatic download fails, manually place the files under

~/.jlama/models

.

Because Jlama relies on the

jdk.incubator.vector

module, you must use JDK 20 or newer and enable the module with the following JVM options:

<code>--add-modules=jdk.incubator.vector
--enable-native-access=ALL-UNNAMED
--enable-preview</code>
1740297164
1740297164

Conclusion

Although Jlama currently offers only small models suitable for edge-device scenarios, it greatly simplifies and speeds up LLM usage within the Java ecosystem, making it a compelling choice for both enterprise applications and innovative projects.

JavaAILLMTutorialInferenceJlama
Java Architecture Diary
Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.