How to Build Trustworthy Coding Agents with Shire’s Custom RAG Workflow
This article explains how to use the Shire language to create reliable coding agents by defining custom RAG workflows, leveraging IDE APIs, code verification functions, and vector‑based search, with detailed examples, configuration snippets, and a roadmap for future enhancements.
Overview
Shire is a lightweight DSL that lets large language models (LLMs) interact with an IDE to build coding agents. By combining Shire with a custom Retrieval‑Augmented Generation (RAG) workflow you can (1) fetch source code or other IDE data with a user‑defined prompt, (2) vectorize and search that data, and (3) pass the retrieved results to the next stage via the execute function.
---
name: "Search"
variables:
"placeholder": /.*\.java/ { splitting | embedding }
lang: "java"
input: "Blog creation process"
afterStreaming: {
default { searching($output) | execute("summary.shire", $input, $output) }
}
---Shire Language Basics
Pattern Action and Variable Extraction
Variables are defined with the variables keyword. Each variable maps a file‑pattern to a pipeline of operations such as grep, splitting, embedding, or searching. The result becomes a named variable that can be referenced later in the script.
---
variables:
"logContent": /.*\.java/ { grep("error.log") | head }
---
Check user code for issues: $logContentTrusted Code Functions (post‑generation verification)
After the LLM finishes streaming, the onStreamingDone lifecycle hook can run a chain of trusted functions: parseCode – parses raw text into code blocks. verifyCode – runs syntax or PSI checks. runCode – executes the verified code.
onStreamingDone: { parseCode | saveFile | openFile | verifyCode | runCode }Example: a Python print("Hello, World!") is generated, saved, verified, and executed automatically.
Indexing and Querying
Shire can embed source files and perform similarity search in a single variable definition.
---
name: "Search"
variables:
"testTemplate": /.*\.kt/ { splitting | embedding | searching("blog") }
---
$testTemplateTypical Shire RAG Flow for Code Explanation
Query Transformation – expand the user question into a concrete search query.
Information Retrieval – retrieve relevant code fragments or documents using vector similarity.
Re‑ranking – optionally reorder results (future support for LLM‑based rerankers).
Summarization – send the final set of results to the LLM for a concise answer.
In practice only two LLM calls are required: one for retrieval and one for summarization.
Step 1 – Custom Code Retrieval
---
name: "Search"
variables:
"placeholder": /.*\.java/ { splitting | embedding }
lang: "java"
input: "Blog creation process"
afterStreaming: {
default { searching($output) | execute("summary.shire", $input, $output) }
}
---The variable placeholder embeds all *.java files that contain the phrase “blog creation process”. After streaming, the searching function looks up the embedding and the execute call forwards the result to a summary script.
Step 2 – Summarization
[]: write some prompt
Code info: $output
User question: $inputThe summary script receives the retrieved code snippet ( $output) and the original user question ( $input), constructs a prompt, and returns a concise answer.
Implementation Details
Technology Stack
Inference engine: ONNX Runtime
Embedding model: Sentence‑Transformers all‑MiniLM‑L6‑v2
Similarity metric: Jaccard (default); also supports TF‑IDF via similarTestCase Storage back‑ends: in‑memory (default), local file system, future SQLite support
Tokenizer: HuggingFace Tokenizer
Supported Document Types
Office files: docx, pptx, xlsx PDF
Binary‑free files
IDE‑supported source code
IDE‑unsupported source code
The current code‑splitting logic is basic and will be refined in upcoming releases.
Future Directions
Integration with external vector databases (e.g., Milvus, Pinecone).
LLM‑based re‑ranking pipelines.
Additional retrieval algorithms such as BM25+, BM42.
Improved code‑splitting and multi‑language support.
Repository: https://github.com/phodal/shire
phodal
A prolific open-source contributor who constantly starts new projects. Passionate about sharing software development insights to help developers improve their KPIs. Currently active in IDEs, graphics engines, and compiler technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
