Run Large Language Models Directly in Java with Jlama – Quick Start Guide
This article introduces Jlama, an open‑source Java LLM inference engine, outlines its key features, provides step‑by‑step CLI and Maven integration instructions, shows code examples, run logs, and special setup notes for using large language models efficiently within Java applications.
What is Jlama?
Jlama is an open-source large language model inference engine built specifically for the Java ecosystem, allowing developers to run LLM inference directly inside Java applications without external services.
Key Features
Supports popular models such as Gemma, Llama, Mistral, Mixtral, Qwen2 and others.
Built on Java 20+ and leverages the Vector API for high-performance inference.
Quick Start
1. Command-line usage
<code># Install jbang
curl -Ls https://sh.jbang.dev | bash -s - app setup
# Install Jlama CLI
jbang app install --force jlama@tjake
# Run a model (Web UI enabled)
jlama restapi tjake/Llama-3.2-1B-Instruct-JQ4 --auto-download</code>2. Java project integration
Add dependencies to pom.xml
<code><dependency>
<groupId>com.github.tjake</groupId>
<artifactId>jlama-core</artifactId>
<version>${jlama.version}</version>
</dependency>
<dependency>
<groupId>com.github.tjake</groupId>
<artifactId>jlama-native</artifactId>
<classifier>${os.detected.name}-${os.detected.arch}</classifier>
<version>${jlama.version}</version>
</dependency>
<!-- LangChain4j integration -->
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j</artifactId>
<version>1.0.0-beta1</version>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-jlama</artifactId>
<version>1.0.0-beta1</version>
</dependency></code>Code example
<code>ChatLanguageModel model = JlamaChatModel.builder()
.modelName("tjake/Llama-3.2-1B-Instruct-JQ4")
.temperature(0.7f)
.build();
ChatResponse response = model.chat(
UserMessage.from("Help me write a java version of the palindromic algorithm")
);
System.out.println("\n" + response.aiMessage().text());</code>Run log
<code>INFO c.g.tjake.jlama.util.HttpSupport - Downloaded file: /Users/.../config.json
INFO c.g.tjake.jlama.util.HttpSupport - Downloading file: /Users/.../model.safetensors
WARNING: Using incubator modules: jdk.incubator.vector
INFO c.g.tjake.jlama.model.AbstractModel - Model type = Q4, Working memory type = F32, Quantized memory type = I8</code>Special Notes
The engine downloads model files from Hugging Face at runtime (requires network access). If automatic download fails, manually place the files under
~/.jlama/models.
Because Jlama relies on the
jdk.incubator.vectormodule, you must use JDK 20 or newer and enable the module with the following JVM options:
<code>--add-modules=jdk.incubator.vector
--enable-native-access=ALL-UNNAMED
--enable-preview</code>Conclusion
Although Jlama currently offers only small models suitable for edge-device scenarios, it greatly simplifies and speeds up LLM usage within the Java ecosystem, making it a compelling choice for both enterprise applications and innovative projects.
Java Architecture Diary
Committed to sharing original, high‑quality technical articles; no fluff or promotional content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.