How I Built a Production‑Ready RAG Service in 3 Weeks Using AI Coding Tools
In just three weeks, I single‑handedly created a production‑grade Retrieval‑Augmented Generation (RAG) API with FastAPI, leveraging Cursor and Claude Code to automate coding, testing, and deployment, and I share practical insights on AI‑assisted development, high cohesion‑low coupling design, TDD, git worktree parallelism, and agent orchestration.
AI Coding
Using Cursor IDE together with Claude Code (cc) provides a hybrid GUI‑shell + CLI‑kernel workflow that lets a single developer build a RAG service API in about three weeks, writing over 4,000 lines of core code and 5,000 lines of unit tests.
Cursor offers a polished programming UI/UX and can integrate multiple large‑model APIs (Claude, Gemini, etc.). Recent GPT‑5 improvements—lower hallucination, longer context, tool‑aware reasoning—make it a competitive alternative.
Claude Code excels at following existing code style and modular structure, reducing rewrites, while cc can run autonomously in the terminal once goals are set.
API‑First Approach
Instead of adopting a full‑stack open‑source RAG project, start by calling model APIs directly. Combine traditional keyword search (e.g., Elasticsearch) with vector search (e.g., Milvus) for a hybrid retrieval pipeline.
Recommended Python stack for a modern backend:
FastAPI for the HTTP API
uv for dependency management
Pydantic for data models
pytest for testing
When scaling, consider deploying and fine‑tuning your own models.
High Cohesion, Low Coupling
*必须 MUST*遵循高内聚低耦合的软件编程规范Modules should focus on a single responsibility (high cohesion) and expose clear interfaces to minimize inter‑module dependencies (low coupling). This improves maintainability, testability, and reusability.
Test‑Driven Development (TDD)
During the three‑week sprint I wrote over 5,000 lines of unit tests—about 25% more than the production code—using AI to suggest edge cases and improve coverage.
TDD follows the Red‑Green‑Refactor cycle:
Red : Write a failing test.
Green : Implement minimal code to pass the test.
Refactor : Improve the code while keeping tests green.
This disciplined loop ensures reliable, incremental development.
Gemini for Planning, Claude for Execution
Gemini’s strong reasoning and 1 M‑token context make it ideal for analyzing logs, proposing solutions, and generating design plans, while Claude Code handles the actual code generation.
Workflow: let Gemini read relevant files and errors, output a plan to a file, then feed that file to Claude Code for implementation.
Plan First
Use Claude Code’s Shift+Tab shortcut to switch to plan mode, discuss requirements thoroughly before letting the agent start coding.
Parallel Development with git worktree
Traditional Git workflows require frequent branch switches, which incurs context‑switch overhead. git worktree creates multiple independent working trees sharing a single .git directory, enabling true parallel development.
Typical steps:
# (run in the main repo directory)
git fetch origin
git worktree add -b feature/query-rewrite ../rag_service-feature-query-rewrite origin/developOpen separate Cursor windows for each worktree, allowing multiple AI agents to work on isolated features simultaneously.
After development, merge the feature branch back into develop and clean up:
# (in the main repo)
git worktree remove ../rag_service-feature-query-rewrite
git branch -d feature/query-rewriteClaude Code Subagents
Custom subagents provide dedicated contexts and toolsets for specific tasks (e.g., code review, unit‑test generation). The main agent (using a powerful model like opus) orchestrates subagents (sonnet for coding, haiku for file reading).
Splitting complex tasks into short, tool‑limited sub‑tasks reduces token consumption and improves performance.
Final Thoughts
Balancing speed, quality, and resources is an impossible triangle; aim for incremental improvements.
Demo projects validate concepts quickly, but production requires scaling, observability, and thorough testing.
Rapid feedback loops outweigh perfect implementations for AI‑driven features.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
