Designing an LLM‑Powered Architecture: The ArchGuard Co‑mate Reference Model

This article presents a detailed reference architecture for building LLM‑driven applications, using the ArchGuard Co‑mate project to illustrate layered design, local model integration, DSL‑based orchestration, and streaming LLM interfaces, complete with code examples and practical implementation notes.

phodal
phodal
phodal
Designing an LLM‑Powered Architecture: The ArchGuard Co‑mate Reference Model

LLM Application Reference Architecture

The architecture is organized into five logical layers that together enable a language‑model‑driven application: UI, Conversation Processing, Operation Orchestration, LLM Enhancement, and the LLM Core. Each layer has a distinct responsibility and can be implemented with open‑source components.

LLM reference architecture diagram
LLM reference architecture diagram

UI Layer – User‑Intent‑Driven Design

The UI layer is the entry point for users (web, mobile, or CLI). It guides users toward the system’s capabilities and limits direct raw LLM usage, turning user intent into structured commands.

Co‑mate UI guidance
Co‑mate UI guidance

Conversation Processing Layer – Local Small Model

A lightweight SentenceTransformer model is embedded locally to embed and match user utterances before falling back to a remote LLM. This mirrors the two‑stage approach used by GitHub Copilot and Bloop.

Onnx Runtime – cross‑platform inference accelerator for the local model.

HuggingFace Tokenizers – high‑performance tokenization library.

Example of registering semantic‑embedding commands in Kotlin:

mapOf(
    ComateCommand.Intro to basicIntroCommand.map { semantic.embed(it) },
    ComateCommand.LayeredStyle to archStyleCommand.map { semantic.embed(it) },
    ComateCommand.ApiGovernance to apiGovernanceCommand.map { semantic.embed(it) },
    ComateCommand.ApiGen to apiGenCommand.map { semantic.embed(it) },
    ComateCommand.FoundationGovernance to foundationGovernanceCommand.map { semantic.embed(it) }
)

Operation Orchestration Layer – Functions as Operations

High‑level user requests are reflected into Kotlin classes, converted to snake_case function names, and exposed to the LLM as callable tools. The LLM receives a structured prompt following a “Thought‑Action‑Input” pattern.

Answer the following questions as best you can.
You have access to the following tools:
introduce_system: introduce_system is a function to introduce a system.
Use the following format:
Question: ...
Thought: ...
Action: ... (one of [introduce_system])
Action Input: ... (parse from the user input, don't add other additional information)
Begin!
Question: Introduce the following system: https://github.com/archguard/ddd-monolithic-code-sample

Reflection‑based function creation example:

val defaultConstructor = clazz.declaredConstructors[0]
val dyFunction = defaultConstructor.newInstance(context) as DyFunction
clazz.name.toSnakeCase() to dyFunction

LLM Enhancement Layer – Precise Context Construction

This layer enriches raw LLM output by assembling relevant context. It may query a vector database for knowledge‑heavy queries or use the local small model for deterministic contexts. Frequently used commands are cached to reduce remote calls, and GPT can be invoked to split long documents into DSL fragments for downstream processing.

LLM Core Layer – Streaming Proxy Interface

The bottom layer hosts the actual Transformer‑based language model and provides a streaming response interface so that the UI can display incremental output while the model generates text.

Runtime Initialization (DSL Execution)

Co‑mate defines a lightweight Kotlin‑based DSL to describe orchestration workflows. The runtime evaluates the DSL, binds it to a foundation specification (e.g., MVC layering), and executes the corresponding functions.

// Initialize runtime
val repl = KotlinInterpreter()
val mvcDslSpec = repl.evalCast<FoundationSpec>(InterpreterRequest(code = mvcFoundation))
// Resolve action from user input
val action = ComateToolingAction.from(action.lowercase())
// Apply default DSL spec when needed
if (action == ComateToolingAction.FOUNDATION_SPEC_GOVERNANCE) {
    comateContext.spec = mvcDslSpec
}

Reference Implementation

The complete open‑source implementation is available at https://github.com/archguard/co-mate. It demonstrates the architecture using Kotlin, Onnx Runtime, HuggingFace Tokenizers, a custom LangChain‑style prompting scheme, and a streaming proxy for the LLM core.

architectureLLMprompt engineeringLangChainKotlinAI Ops
phodal
Written by

phodal

A prolific open-source contributor who constantly starts new projects. Passionate about sharing software development insights to help developers improve their KPIs. Currently active in IDEs, graphics engines, and compiler technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.