Engineering LLM Applications: Architecture, Prompt Modeling, and Multi‑Language Strategies
This article shares practical insights from months of building LLM proof‑of‑concepts, covering language‑agnostic architectures, FFI integration, prompt engineering, RAG patterns, DSL design, and four core architectural principles for scalable AI applications.
Several months ago, Thoughtworks held an internal AIGC workshop where participants agreed that without open‑source models reducing enterprise LLM costs, large language models would quickly become unsustainable; thus, open‑source LLMs combined with LoRA fine‑tuning are expected to become mainstream. Since then, models such as LLaMA 2 and Code LLaMA 2 have demonstrated this potential.
From Language and Ecosystem to LLM Service as API vs. FFI
Many enterprises prototype knowledge‑enhanced tools using Python + LangChain, which raises engineering questions:
Should the service expose an API written in Python?
What code solutions exist within the current language and infrastructure stack?
Because Python’s dynamic nature hampers IDE analysis and developer productivity—even with type libraries like Pydantic—the author advocates aligning LLM services with existing JVM‑based infrastructure, leveraging Java, Kotlin, and Rust.
AI Infrastructure per Language
Java: Deep Java Library (DJL) offers extensive deep‑learning APIs for rapid LLM application development.
Kotlin: KInference optimizes ONNX inference on server or client side.
Rust: Uses ndarray for multi‑dimensional arrays in the CoUnit project.
In most scenarios, a full AI stack is unnecessary; simple utilities such as token‑length calculation can already curb LLM costs.
FFI as an Interface Mechanism
Tokenizers like Tiktoken (Rust‑based) require FFI to be called from other languages (e.g., Python calling Rust for speed). Two common FFI‑based inference libraries are:
Tokenizer/tokeniser: OpenAI’s Tiktoken, HuggingFace Tokenizers (Rust), JetBrains AI Assistant (Kotlin).
ONNX Runtime: Cross‑platform inference accelerator written in C++, accessed via FFI from various languages.
LLM Application Architecture
LLM applications differ little from conventional software, but they must account for LLM‑specific impacts. Three usage patterns are identified:
Basic LLM apps that rely solely on prompt engineering.
Co‑pilot style apps that combine prompts with agents and external tools, orchestrating workflows based on user intent.
Autonomous agents where the LLM itself generates DSL/workflows to control tools.
From these patterns, four architectural principles emerge:
User‑intent‑driven design: Build domain‑specific AI roles and guide users via DSLs.
Context engineering: Structure applications to capture business context for precise prompts and low‑latency responses.
Atomic capability mapping: Identify LLM‑strengths and map them to missing application capabilities.
Language interfaces: Design next‑generation APIs that let LLMs understand, schedule, and orchestrate services.
The biggest practical challenge is guiding users and enriching context.
Prompt Modeling and Optimization
Prompt models must align with the underlying problem domain. The article demonstrates a LangChain example where a series of Human / AI exchanges are stored as examples, then used with PromptTemplate variants such as FewShotPromptWithTemplates and FunctionExplainerPromptTemplate. Kotlin‑based QA templates are also shown, illustrating how serialization and template engines (e.g., Apache Velocity) replace variables with real data.
Context Construction: RAG and Domain‑Specific Modes
Retrieval‑augmented generation (RAG) remains dominant for knowledge‑base Q&A, yet bespoke domain‑specific patterns often yield higher quality at the cost of generality. Code‑generation tools (GitHub Copilot, JetBrains AI Assistant, AutoDev) rely on precise IDE context rather than RAG.
Effective RAG requires vector databases and knowledge indexes, but the author notes that prompt quality still suffers without careful chunk distribution ("Lost in the Middle" problem).
Uncertain Language APIs
LLM interaction can be framed as natural‑language APIs, falling into two categories:
LLM + Workflow: LLM decides which tool or API to invoke based on intent.
LLM‑generated DSL: LLM outputs a DSL (e.g., JSON) that downstream programs execute.
Three implementation models are discussed:
Tooling mode – dynamic tool lists generated from context.
Function calling – LLM detects when to call a function and passes arguments.
Intent‑recognition micro‑models – fine‑tuned small models for specific scenarios.
DSLs serve as an intermediate representation for LLMs to produce code or UI structures.
Conclusion and Next Steps
The article consolidates months of LLM application engineering experience, highlighting reusable patterns and emphasizing the need to formalize these patterns for faster future development.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
phodal
A prolific open-source contributor who constantly starts new projects. Passionate about sharing software development insights to help developers improve their KPIs. Currently active in IDEs, graphics engines, and compiler technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
