Designing a JVM‑Based LLM Framework: Insights from Chocolate Factory
This article explores the design principles, architectural decisions, and practical code examples behind the Chocolate Factory framework, a JVM‑centric LLM development platform inspired by LangChain, LlamaIndex, Spring AI, and PromptFlow, highlighting SDK construction, RAG workflows, and prompt engineering challenges.
Overview
Chocolate Factory is a JVM‑based framework for building LLM‑enabled applications. It incorporates ideas from LangChain4j, LangChain, LlamaIndex, Spring AI, Semantic Kernel and PromptFlow. The source code is hosted at https://github.com/unit-mesh/chocolate-factory.
Motivation for a JVM LLM SDK
Recent experiments integrating existing JVM infrastructure with LLMs highlighted three recurring needs:
Encapsulate internal LLM services behind a unified SDK.
Rapidly prototype Retrieval‑Augmented Generation (RAG) proof‑of‑concepts.
Provide developer‑friendly tools for prompt design and debugging.
Key Dependencies
dependencies {
// Core library
implementation("cc.unitmesh:cocoa-core:0.3.4")
// Code splitter
implementation("cc.unitmesh:code-splitter:0.3.4")
// Elasticsearch vector store + regular search
implementation("cc.unitmesh:store-elasticsearch:0.3.4")
// Local CPU embedding via SentenceTransformers
implementation("cc.unitmesh:sentence-transformers:0.3.4")
}These modules expose simple APIs for embedding, vector storage, and code chunking, allowing seamless integration with existing AI services and MaaS platforms.
Core Functional Blocks
LLM API Interaction : a two‑part design consisting of a standard model wrapper that depends on the MaaS platform and a streaming/custom‑processing API for handling responses.
Document Embedding : includes multi‑format document processing & chunking, configurable vector model precision, and a middle‑layer that can query both vector and traditional databases.
Automated Workflow (Flow + Agent) : an extensible area for future workflow‑oriented automation.
LLM Application Interaction Types
Simple prompt‑based interaction.
Embedding‑based interaction (the primary focus for most RAG scenarios).
Workflow‑driven automation.
RAG Abstraction
RAG is split into two stages: Indexing (splitting data and storing vectors) and Querying (retrieving relevant chunks and feeding them to an LLM). The following RAGScript example demonstrates the full flow:
rag {
indexing {
val document = document("filename.txt")
val chunks = document.split()
store.indexing(chunks)
}
querying {
val results = store.findRelevant("Hello World").lowInMiddle()
llm.completion { "// combine results to build prompt" }
}
}Implementation Details and Use Cases
Local embedding using SentenceTransformers and ONNX Runtime to keep vectorization costs low.
Precise code splitting that respects interfaces, inheritance and custom rules, improving the relevance of code‑related retrieval.
Prototype scenarios include interactive UI code assistance, lightweight code interpreters, document‑based specification queries, semantic code search, and long‑data test‑case generation.
Post‑Processing Strategies
Real‑time Return Mode : stream partial results (e.g., generated code) back to the user for immediate feedback.
Second‑Stage Result Processing : after the LLM finishes, perform additional operations such as executing user code to generate charts or further transformations.
Prompt Debugging and Orchestration
Prompt engineering remains the biggest challenge. Chocolate Factory recommends using Apache Velocity as a template engine and creating an isolated context for each workflow. A typical Velocity template looks like:
system:
You are a helpful assistant.
{% for item in chat_history %}
user:
{{item.inputs.question}}
assistant:
{{item.outputs.answer}}
{% endfor %}
user:
{{question}}These templates can be combined with CLI or IDE visual tools for iterative exploration.
References
Semantic code‑search example and additional documentation are available at https://framework.unitmesh.cc/docs/rag.
phodal
A prolific open-source contributor who constantly starts new projects. Passionate about sharing software development insights to help developers improve their KPIs. Currently active in IDEs, graphics engines, and compiler technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
