Designing a JVM‑Based LLM Framework: Insights from Chocolate Factory

This article explores the design principles, architectural decisions, and practical code examples behind the Chocolate Factory framework, a JVM‑centric LLM development platform inspired by LangChain, LlamaIndex, Spring AI, and PromptFlow, highlighting SDK construction, RAG workflows, and prompt engineering challenges.

phodal
phodal
phodal
Designing a JVM‑Based LLM Framework: Insights from Chocolate Factory

Overview

Chocolate Factory is a JVM‑based framework for building LLM‑enabled applications. It incorporates ideas from LangChain4j, LangChain, LlamaIndex, Spring AI, Semantic Kernel and PromptFlow. The source code is hosted at https://github.com/unit-mesh/chocolate-factory.

Motivation for a JVM LLM SDK

Recent experiments integrating existing JVM infrastructure with LLMs highlighted three recurring needs:

Encapsulate internal LLM services behind a unified SDK.

Rapidly prototype Retrieval‑Augmented Generation (RAG) proof‑of‑concepts.

Provide developer‑friendly tools for prompt design and debugging.

Key Dependencies

dependencies {
    // Core library
    implementation("cc.unitmesh:cocoa-core:0.3.4")
    // Code splitter
    implementation("cc.unitmesh:code-splitter:0.3.4")
    // Elasticsearch vector store + regular search
    implementation("cc.unitmesh:store-elasticsearch:0.3.4")
    // Local CPU embedding via SentenceTransformers
    implementation("cc.unitmesh:sentence-transformers:0.3.4")
}

These modules expose simple APIs for embedding, vector storage, and code chunking, allowing seamless integration with existing AI services and MaaS platforms.

Core Functional Blocks

LLM API Interaction : a two‑part design consisting of a standard model wrapper that depends on the MaaS platform and a streaming/custom‑processing API for handling responses.

Document Embedding : includes multi‑format document processing & chunking, configurable vector model precision, and a middle‑layer that can query both vector and traditional databases.

Automated Workflow (Flow + Agent) : an extensible area for future workflow‑oriented automation.

LLM Application Interaction Types

Simple prompt‑based interaction.

Embedding‑based interaction (the primary focus for most RAG scenarios).

Workflow‑driven automation.

RAG Abstraction

RAG is split into two stages: Indexing (splitting data and storing vectors) and Querying (retrieving relevant chunks and feeding them to an LLM). The following RAGScript example demonstrates the full flow:

rag {
    indexing {
        val document = document("filename.txt")
        val chunks = document.split()
        store.indexing(chunks)
    }
    querying {
        val results = store.findRelevant("Hello World").lowInMiddle()
        llm.completion { "// combine results to build prompt" }
    }
}

Implementation Details and Use Cases

Local embedding using SentenceTransformers and ONNX Runtime to keep vectorization costs low.

Precise code splitting that respects interfaces, inheritance and custom rules, improving the relevance of code‑related retrieval.

Prototype scenarios include interactive UI code assistance, lightweight code interpreters, document‑based specification queries, semantic code search, and long‑data test‑case generation.

Post‑Processing Strategies

Real‑time Return Mode : stream partial results (e.g., generated code) back to the user for immediate feedback.

Second‑Stage Result Processing : after the LLM finishes, perform additional operations such as executing user code to generate charts or further transformations.

Prompt Debugging and Orchestration

Prompt engineering remains the biggest challenge. Chocolate Factory recommends using Apache Velocity as a template engine and creating an isolated context for each workflow. A typical Velocity template looks like:

system:
You are a helpful assistant.
{% for item in chat_history %}
user:
{{item.inputs.question}}
assistant:
{{item.outputs.answer}}
{% endfor %}
user:
{{question}}

These templates can be combined with CLI or IDE visual tools for iterative exploration.

References

Semantic code‑search example and additional documentation are available at https://framework.unitmesh.cc/docs/rag.

Architecture diagram
Architecture diagram
LLM interaction categories
LLM interaction categories
RAG abstraction diagram
RAG abstraction diagram
Prototype diagram
Prototype diagram
JVMLLMprompt engineeringRAGAI developmentFramework
phodal
Written by

phodal

A prolific open-source contributor who constantly starts new projects. Passionate about sharing software development insights to help developers improve their KPIs. Currently active in IDEs, graphics engines, and compiler technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.