How Backend Engineers Are Breaking Through AI with RAG Architectures
This article details a backend developer's two‑year AI journey, the challenges of rapid model advances, and how applying microservice principles to Retrieval‑Augmented Generation (RAG) creates a scalable, multi‑agent platform for insurance knowledge, memory, and intelligent agents.
Backend AI Breakthrough with RAG Architecture
As a backend engineer, I have been on the AI path for two years, moving from Chat QA to AI agents and finally to multi‑agent systems, aiming for an AI‑native approach.
Since Q2 of this year we integrated AI into insurance business scenarios, achieving full AI deployment. Our AI agents have progressed beyond L1 (Chatbot) to L2 (Reasoner), showing a comprehensive breakthrough.
I feel anxious because large‑model development is accelerating, especially after products like Cursor and JoyCode appear. The industry focus has shifted from micro‑service and micro‑frontend architectures to an AI‑first wave, creating pressure across all development teams.
My remedy is to apply microservice architecture to AI, arranging agents, planning, RAG, evaluation, MCP, LLM, prompt, memory, and multimodal components together.
Our insurance Eva RAG architecture evolved through three stages: basic RAG, DeepSearch, and a hybrid retrieval architecture (Graph RAG + DeepSearch + continuous reflection and validation).
RAG Architecture
History
RAG (Retrieval‑Augmented Generation) augments large language models with external knowledge to reduce hallucinations and improve accuracy. It originated in 2020 from a Facebook AI Research paper titled “Retrieval‑Augmented Generation for Knowledge‑Intensive NLP Tasks”.
Basic RAG Architecture – Simple Knowledge Manager
The basic RAG consists of two core components: a generation component (ETL pipeline) and a retrieval component. The diagram below illustrates the workflow.
The generation component extracts, transforms, and loads documents (PDF, DOC, Excel, images, etc.), handling Chinese support and Excel cell processing.
Chunk and embedding are the two key steps in the transformation phase.
Chunk divides documents into manageable pieces; common strategies include fixed‑size, semantic, recursive, structure‑based, and model‑based chunking.
Embedding converts text into vectors for similarity search, storing them in a vector database.
Data loading uses Elasticsearch 8+ for hybrid storage, though other vector stores or relational databases are possible.
The retrieval component includes preprocessing, retrieval, and post‑processing.
Preprocessing focuses on query expansion, translation, and business‑specific handling.
Retrieval relies on sparse (TF‑IDF) and dense (BM25) algorithms, converting queries to vectors and computing cosine similarity.
Sparse algorithm
uses LLM‑extracted keywords and TF‑IDF vectors.
Dense algorithm
typically uses BM25 with embedding vectors.
After retrieval, top‑K results are selected and optionally reranked.
Post‑processing involves ranking (rerank) and concatenating retrieved chunks with the original query to form a prompt for the LLM to generate the final answer.
While building a basic RAG framework is straightforward, achieving high performance in production requires addressing business‑specific challenges.
Our RAG Architecture
Our product combines an insurance knowledge base, memory store, file store, agents, search, and evaluation, driven by algorithms, engineering, and data.
Algorithmic Agentic RAG
We integrated open‑source WebWeaver, Microsoft GraphRAG, and recent papers (ZEP, REFRAG) to create a hybrid retrieval system (Agentic RAG + DeepSearch) with multi‑type memory (scenario, program, semantic, temporal).
Engineering RAG Platform
The platform links the full workflow, providing standard interfaces for agents to focus on model training.
Architecture layers: agent layer, business logic layer, retrieval layer, data layer. Stack: Spring AI, Elasticsearch 8+, Neo4j, Redis, JD Cloud; supports Python code and RAG agents.
Data Architecture
Triangular matrix of insurance knowledge base, memory store, and task center.
Memory store includes semantic, program, and scenario graphs with dual timestamps for freshness.
Chunk strategies follow Cognee’s parameter‑tuning, offering five methods.
We built a multi‑agent platform (Eva) to drive business, handling large insurance documents that are not publicly available.
Future of RAG
Rather than over‑speculating, we will continue to iterate and share insights layer by layer.
Agentic RAG now includes DeepSearch, Graph RAG, and basic RAG; we plan to keep exposing each component for community discussion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
