How I Built an AI Contract Review System for 60,000 RMB in One Month

In 45 days a two‑person team delivered an AI‑powered contract review platform that parses PDFs, extracts key clauses, flags risks, and integrates with enterprise tools, using Python, FastAPI, LangChain, large language models, vector databases and OCR technologies.

SpringMeng
SpringMeng
SpringMeng
How I Built an AI Contract Review System for 60,000 RMB in One Month

Hello, I'm programmer Xiao Meng.

I received a 60,000 RMB contract to develop an AI contract review system, completed in 45 days with a two‑person team.

1. Technical Choices

We used familiar frameworks consistent with previous AI projects. The model architecture combines a large model for generation and smaller fine‑tuned models for fallback.

MVP stack: GPT‑4o / GLM‑4 API, LangChain, Chroma vector store.

Production stack: Private‑deployed large models such as Qwen‑72B or Ring‑1T, vertically fine‑tuned small models (e.g., LoRA), Qdrant or Milvus for vector storage, and a full task orchestration service.

1. Core Backend Framework

Python (preferred): best ecosystem for AI integration.

FastAPI: high‑performance, auto‑generated API docs, ideal for LLM calls.

Django: provides a powerful admin backend and ORM when needed.

Java + Spring Boot / Spring Cloud: for enterprise‑level high concurrency and strong transaction requirements.

Vue: front‑end UI.

2. AI & NLP Stack

Contract review relies on a combination of a general‑purpose large model and traditional NLP processing.

Local large models: Qwen‑72B / Qwen‑14B.

Inference acceleration: vLLM (recommended), TGI, TensorRT‑LLM.

Traditional NLP & feature extraction for clause locating and element extraction use SpaCy (industrial speed), HanLP (Chinese legal text optimization), and NLTK.

Entity recognition: BIO tagging with BERT‑BiLSTM‑CRF.

3. Document Parsing & Pre‑processing (critical)

First step converts PDF/Word/Image to structured text.

PDF parsing: PyMuPDF (fast text extraction), pdfplumber (preserves tables), PDF.js (front‑end preview).

OCR for scanned contracts: PaddleOCR (good Chinese performance, supports tables), Tesseract.

Complex document parsing (paragraphs, headings, tables): Unstructured.io, LangChain Document Loaders.

Format conversion tools: python‑docx, Apache POI (Java), markdown.

4. AI Application Development Framework

We use LangChain / LlamaIndex as the core framework for RAG, agents, and prompt orchestration.

Vector databases for contract embeddings: Milvus (production‑grade, distributed), Qdrant (high performance, easy to use), Chroma (lightweight for dev/testing), Pinecone (cloud‑managed).

Embedding models: BAAI/bge‑large‑zh‑v1.5 for Chinese semantic search, OpenAI text‑embedding‑3‑small.

5. Database & Cache

MySQL for persistent storage and Redis for caching.

2. Functional Requirements

The system can quickly organize and retrieve files via agents and knowledge bases, greatly improving efficiency.

It automatically extracts parties’ addresses, names, companies, contact information, and supports user‑defined fields.

Risk identification includes missing‑clause alerts, unfair‑clause warnings, text‑inconsistency checks, and multi‑format support (PDF, Word, PNG, JPG).

Additional features: annotation, commenting, sharing results, integration with internal approval tools (DingTalk, Feishu, WeCom), contract archiving and retrieval, tag‑based search (by entity, amount, date, risk level), full‑text semantic search, expiration and renewal reminders.

Overall, the platform enables large‑scale document retrieval and knowledge extraction from contracts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonAILangChainVector DatabaseNLPFastAPIContract Review
SpringMeng
Written by

SpringMeng

Focused on software development, sharing source code and tutorials for various systems.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.