How Backend Engineers Are Breaking Through AI with RAG Architectures

This article details a backend developer's two‑year AI journey, the challenges of rapid model advances, and how applying microservice principles to Retrieval‑Augmented Generation (RAG) creates a scalable, multi‑agent platform for insurance knowledge, memory, and intelligent agents.

JD Tech Talk
JD Tech Talk
JD Tech Talk
How Backend Engineers Are Breaking Through AI with RAG Architectures

Backend AI Breakthrough with RAG Architecture

As a backend engineer, I have been on the AI path for two years, moving from Chat QA to AI agents and finally to multi‑agent systems, aiming for an AI‑native approach.

Since Q2 of this year we integrated AI into insurance business scenarios, achieving full AI deployment. Our AI agents have progressed beyond L1 (Chatbot) to L2 (Reasoner), showing a comprehensive breakthrough.

I feel anxious because large‑model development is accelerating, especially after products like Cursor and JoyCode appear. The industry focus has shifted from micro‑service and micro‑frontend architectures to an AI‑first wave, creating pressure across all development teams.

My remedy is to apply microservice architecture to AI, arranging agents, planning, RAG, evaluation, MCP, LLM, prompt, memory, and multimodal components together.

Our insurance Eva RAG architecture evolved through three stages: basic RAG, DeepSearch, and a hybrid retrieval architecture (Graph RAG + DeepSearch + continuous reflection and validation).

RAG Architecture

History

RAG (Retrieval‑Augmented Generation) augments large language models with external knowledge to reduce hallucinations and improve accuracy. It originated in 2020 from a Facebook AI Research paper titled “Retrieval‑Augmented Generation for Knowledge‑Intensive NLP Tasks”.

Basic RAG Architecture – Simple Knowledge Manager

The basic RAG consists of two core components: a generation component (ETL pipeline) and a retrieval component. The diagram below illustrates the workflow.

The generation component extracts, transforms, and loads documents (PDF, DOC, Excel, images, etc.), handling Chinese support and Excel cell processing.

Chunk and embedding are the two key steps in the transformation phase.

Chunk divides documents into manageable pieces; common strategies include fixed‑size, semantic, recursive, structure‑based, and model‑based chunking.

Embedding converts text into vectors for similarity search, storing them in a vector database.

Data loading uses Elasticsearch 8+ for hybrid storage, though other vector stores or relational databases are possible.

The retrieval component includes preprocessing, retrieval, and post‑processing.

Preprocessing focuses on query expansion, translation, and business‑specific handling.

Retrieval relies on sparse (TF‑IDF) and dense (BM25) algorithms, converting queries to vectors and computing cosine similarity.

Sparse algorithm

uses LLM‑extracted keywords and TF‑IDF vectors.

Dense algorithm

typically uses BM25 with embedding vectors.

After retrieval, top‑K results are selected and optionally reranked.

Post‑processing involves ranking (rerank) and concatenating retrieved chunks with the original query to form a prompt for the LLM to generate the final answer.

While building a basic RAG framework is straightforward, achieving high performance in production requires addressing business‑specific challenges.

Our RAG Architecture

Our product combines an insurance knowledge base, memory store, file store, agents, search, and evaluation, driven by algorithms, engineering, and data.

Algorithmic Agentic RAG

We integrated open‑source WebWeaver, Microsoft GraphRAG, and recent papers (ZEP, REFRAG) to create a hybrid retrieval system (Agentic RAG + DeepSearch) with multi‑type memory (scenario, program, semantic, temporal).

Engineering RAG Platform

The platform links the full workflow, providing standard interfaces for agents to focus on model training.

Architecture layers: agent layer, business logic layer, retrieval layer, data layer. Stack: Spring AI, Elasticsearch 8+, Neo4j, Redis, JD Cloud; supports Python code and RAG agents.

Data Architecture

Triangular matrix of insurance knowledge base, memory store, and task center.

Memory store includes semantic, program, and scenario graphs with dual timestamps for freshness.

Chunk strategies follow Cognee’s parameter‑tuning, offering five methods.

We built a multi‑agent platform (Eva) to drive business, handling large insurance documents that are not publicly available.

Future of RAG

Rather than over‑speculating, we will continue to iterate and share insights layer by layer.

Agentic RAG now includes DeepSearch, Graph RAG, and basic RAG; we plan to keep exposing each component for community discussion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architectureRAGknowledge baseRetrieval Augmented GenerationBackend AI
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.