Artificial Intelligence 21 min read

Building a Simple Local AI Question‑Answer System with Java, LangChain, Ollama, and ChromaDB

This article explains how to set up a lightweight local AI Q&A system using Java, LangChain (and LangChain4J), Ollama for LLM inference, embedding techniques, and a vector database (ChromaDB), covering core concepts, environment preparation, Maven dependencies, and sample code.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Building a Simple Local AI Question‑Answer System with Java, LangChain, Ollama, and ChromaDB

Introduction

Hello everyone, I am interested in large AI models and have spent time learning about them. This article shares a step‑by‑step guide to building a simple local AI question‑answer system, primarily using Java with a few Python concepts. Instead of ChatGPT, which is hard to access in China, we use open‑source models such as LLaMA and Qwen.

Key Concepts

1. Large Language Models (LLM)

LLMs are massive transformer‑based models with billions to trillions of parameters that excel at natural language understanding and generation. They are typically trained on huge text corpora, require GPU clusters, and can be applied to tasks like text generation, translation, summarization, and dialogue.

2. Embedding

Embedding converts text into numerical vectors that capture semantic similarity. Common methods include Word2Vec, GloVe, FastText, BERT, ELMo, and Sentence‑Transformers. These vectors enable downstream NLP tasks such as classification, sentiment analysis, and retrieval.

3. Vector Database

Vector databases store high‑dimensional vectors and provide efficient similarity search (ANN). They support indexing structures, hybrid queries, scalability, real‑time updates, and cloud‑native deployment. Popular projects include FAISS, Pinecone, Weaviate, Qdrant, and Milvus.

4. Retrieval‑Augmented Generation (RAG)

RAG combines retrieval of relevant documents with generation by an LLM, improving factuality, reducing hallucinations, and enabling domain‑specific knowledge. The workflow involves retrieving context from a vector store, feeding it to the LLM, and producing a grounded answer.

AI Application Frameworks

LangChain

LangChain is a framework for building LLM‑powered applications. It provides chains, agents, memory, loaders, prompt engineering, a hub of reusable components, external integrations, and monitoring tools.

LangChain4J

LangChain4J brings similar capabilities to the Java ecosystem, offering modular design, multi‑model support, memory mechanisms, tool integration, and chain execution for building chatbots, text generators, and other NLP services.

Local Environment Setup

1. Run a Local LLM with Ollama

Download and install Ollama ( https://ollama.com/ ). Start the service (default port 11434) and pull models such as llama3 and qwen using ollama pull modelName . Verify with ollama list .

2. Start a Local Vector Database (ChromaDB)

Install ChromaDB via pip install chromadb and launch it with chroma run . The service runs on http://localhost:8000 .

Java Implementation

(1) Maven Dependencies

<properties>
    <maven.compiler.source>8</maven.compiler.source>
    <maven.compiler.target>8</maven.compiler.target>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <langchain4j.version>0.31.0</langchain4j.version>
</properties>

<dependencies>
    <!-- LangChain4J core -->
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-core</artifactId>
        <version>${langchain4j.version}</version>
    </dependency>
    ... (other LangChain4J modules) ...
    <!-- Ollama integration -->
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-ollama</artifactId>
        <version>${langchain4j.version}</version>
    </dependency>
    <!-- ChromaDB client -->
    <dependency>
        <groupId>io.github.amikos-tech</groupId>
        <artifactId>chromadb-java-client</artifactId>
        <version>0.1.5</version>
    </dependency>
</dependencies>

(2) Core Code

Loading a local text file as a knowledge base:

public static void main(String[] args) throws ApiException {
    // Load document
    Document document = getDocument("joke.txt");
    // ... further processing ...
}

private static Document getDocument(String fileName) {
    URL docUrl = LangChainMainTest.class.getClassLoader().getResource(fileName);
    if (docUrl == null) {
        log.error("File not found");
    }
    Document document = null;
    try {
        Path path = Paths.get(docUrl.toURI());
        document = FileSystemDocumentLoader.loadDocument(path);
    } catch (URISyntaxException e) {
        log.error("Error loading file", e);
    }
    return document;
}

Splitting the document into segments:

DocumentByLineSplitter lineSplitter = new DocumentByLineSplitter(200, 0, new OpenAiTokenizer());
List
segments = lineSplitter.split(document);
log.info("Number of segments: {}", segments.size());
segments.forEach(s -> log.info("Segment: {}", s.text()));

Embedding the segments and storing them in ChromaDB:

private static final String CHROMA_DB_DEFAULT_COLLECTION_NAME = "java-langchain-demo";
private static final String CHROMA_URL = "http://localhost:8000";

OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
        .baseUrl("http://localhost:11434")
        .modelName("llama3")
        .build();

Client client = new Client(CHROMA_URL);
EmbeddingStore
embeddingStore = ChromaEmbeddingStore.builder()
        .baseUrl(CHROMA_URL)
        .collectionName(CHROMA_DB_DEFAULT_COLLECTION_NAME)
        .build();

segments.forEach(segment -> {
    Embedding e = embeddingModel.embed(segment).content();
    embeddingStore.add(e, segment);
});

Retrieving relevant vectors:

String query = "polar bear";
Embedding queryEmbedding = embeddingModel.embed(query).content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
        .queryEmbedding(queryEmbedding)
        .maxResults(1)
        .build();
EmbeddingSearchResult
result = embeddingStore.search(request);
TextSegment retrieved = result.matches().get(0).embedded();
log.info("Query result: {}", retrieved.text());

Generating an answer with the LLM:

PromptTemplate template = PromptTemplate.from(
        "Based on the following context answer in Chinese:\n{{context}}\nQuestion:\n{{question}}");
Map
vars = new HashMap<>();
vars.put("context", retrieved.text());
vars.put("question", "What did the polar bear do?");
Prompt prompt = template.apply(vars);

OllamaChatModel chatModel = OllamaChatModel.builder()
        .baseUrl("http://localhost:11434")
        .modelName("llama3")
        .build();
UserMessage userMsg = prompt.toUserMessage();
Response
resp = chatModel.generate(userMsg);
AiMessage answer = resp.content();
log.info("LLM answer: {}", answer.text());

(3) Test Run

The sample text file contains a short joke about a polar bear and a penguin. Querying "What did the polar bear do?" yields the answer: "The polar bear pulled off its own fur one strand at a time."

Conclusion

This guide demonstrates a minimal end‑to‑end AI Q&A pipeline using Java, LangChain4J, Ollama, and ChromaDB. The same approach can be wrapped in a Spring Boot web service for production use, and LangChain offers many additional features such as advanced prompting, tool calling, and memory management.

JavaAILLMLangChainRAGvector database
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.