Designing an LLM‑Powered Architecture: The ArchGuard Co‑mate Reference Model
This article presents a detailed reference architecture for building LLM‑driven applications, using the ArchGuard Co‑mate project to illustrate layered design, local model integration, DSL‑based orchestration, and streaming LLM interfaces, complete with code examples and practical implementation notes.
LLM Application Reference Architecture
The architecture is organized into five logical layers that together enable a language‑model‑driven application: UI, Conversation Processing, Operation Orchestration, LLM Enhancement, and the LLM Core. Each layer has a distinct responsibility and can be implemented with open‑source components.
UI Layer – User‑Intent‑Driven Design
The UI layer is the entry point for users (web, mobile, or CLI). It guides users toward the system’s capabilities and limits direct raw LLM usage, turning user intent into structured commands.
Conversation Processing Layer – Local Small Model
A lightweight SentenceTransformer model is embedded locally to embed and match user utterances before falling back to a remote LLM. This mirrors the two‑stage approach used by GitHub Copilot and Bloop.
Onnx Runtime – cross‑platform inference accelerator for the local model.
HuggingFace Tokenizers – high‑performance tokenization library.
Example of registering semantic‑embedding commands in Kotlin:
mapOf(
ComateCommand.Intro to basicIntroCommand.map { semantic.embed(it) },
ComateCommand.LayeredStyle to archStyleCommand.map { semantic.embed(it) },
ComateCommand.ApiGovernance to apiGovernanceCommand.map { semantic.embed(it) },
ComateCommand.ApiGen to apiGenCommand.map { semantic.embed(it) },
ComateCommand.FoundationGovernance to foundationGovernanceCommand.map { semantic.embed(it) }
)Operation Orchestration Layer – Functions as Operations
High‑level user requests are reflected into Kotlin classes, converted to snake_case function names, and exposed to the LLM as callable tools. The LLM receives a structured prompt following a “Thought‑Action‑Input” pattern.
Answer the following questions as best you can.
You have access to the following tools:
introduce_system: introduce_system is a function to introduce a system.
Use the following format:
Question: ...
Thought: ...
Action: ... (one of [introduce_system])
Action Input: ... (parse from the user input, don't add other additional information)
Begin!
Question: Introduce the following system: https://github.com/archguard/ddd-monolithic-code-sampleReflection‑based function creation example:
val defaultConstructor = clazz.declaredConstructors[0]
val dyFunction = defaultConstructor.newInstance(context) as DyFunction
clazz.name.toSnakeCase() to dyFunctionLLM Enhancement Layer – Precise Context Construction
This layer enriches raw LLM output by assembling relevant context. It may query a vector database for knowledge‑heavy queries or use the local small model for deterministic contexts. Frequently used commands are cached to reduce remote calls, and GPT can be invoked to split long documents into DSL fragments for downstream processing.
LLM Core Layer – Streaming Proxy Interface
The bottom layer hosts the actual Transformer‑based language model and provides a streaming response interface so that the UI can display incremental output while the model generates text.
Runtime Initialization (DSL Execution)
Co‑mate defines a lightweight Kotlin‑based DSL to describe orchestration workflows. The runtime evaluates the DSL, binds it to a foundation specification (e.g., MVC layering), and executes the corresponding functions.
// Initialize runtime
val repl = KotlinInterpreter()
val mvcDslSpec = repl.evalCast<FoundationSpec>(InterpreterRequest(code = mvcFoundation))
// Resolve action from user input
val action = ComateToolingAction.from(action.lowercase())
// Apply default DSL spec when needed
if (action == ComateToolingAction.FOUNDATION_SPEC_GOVERNANCE) {
comateContext.spec = mvcDslSpec
}Reference Implementation
The complete open‑source implementation is available at https://github.com/archguard/co-mate. It demonstrates the architecture using Kotlin, Onnx Runtime, HuggingFace Tokenizers, a custom LangChain‑style prompting scheme, and a streaming proxy for the LLM core.
phodal
A prolific open-source contributor who constantly starts new projects. Passionate about sharing software development insights to help developers improve their KPIs. Currently active in IDEs, graphics engines, and compiler technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
