19 min read

How AI Can Turn a Code Maze into a Knowledge Hub for New Developers

This article follows a new developer named Li Ming as he confronts undocumented code, hidden business rules, and fragmented knowledge, then demonstrates how leveraging large‑language models to index, associate, and retrieve code, requirements, and operational data can create an intelligent knowledge base that streamlines onboarding, reduces errors, and enhances collaboration across development, testing, and product teams.

JD Cloud Developers

Jul 8, 2025

How AI Can Turn a Code Maze into a Knowledge Hub for New Developers

1. Origin

Li Ming, a newly hired R&D engineer at an internet company, started his career with high expectations, but within two weeks his enthusiasm was dampened by reality.

First week: When he received his first task, his mentor only said, "We have done something similar before, refer to the historical code." The repository lacked comments, variable names were cryptic, and no requirement documents existed. He made changes blindly, and the feature caused an online failure because a hidden business rule known only to senior staff was missed.

Second week: A tester asked, "Will this change affect the order status flow?" Li Ming was unaware of such a chain in the system. The tester replied, "The last change left no documentation, we can only guess."

Third week: The product manager demanded an urgent fix for a "historical issue," but only three‑year‑old meeting notes could be found in Confluence. The operations team spent a lot of time repeatedly answering questions like "What does this error mean?" and "Which service does this depend on?"

Late night reflection:

Why does every change feel like stepping on a landmine?

Why does the codebase become more "smoke‑stack" over time while knowledge remains only in senior engineers' heads, making the learning curve steep for newcomers?

If an AI could directly tell which requirement a piece of code implements or automatically generate a business‑logic description, it would be ideal.

He imagined that a large model could link code, requirement documents, and operation manuals, and perhaps the key to breaking the "code maze" lies in the combination of AI and a knowledge base.

2. Solution Approach

The repeated frustrations made Li Ming realize that these problems could not be solved by a single person. He decided to proactively seek a solution.

One night, while staring at complex code, he thought: if all scattered knowledge points could be connected, could the current problems be solved?

First attempt: He recalled his mentor mentioning large‑model technology. He wrote a simple script that indexed the company's requirement documents and code commit records. Although rough, it allowed keyword searches for related documents. He wondered what would happen if these indexed results were fed into a large model for further inference.

Initial validation: After training a basic intelligent agent, the product manager asked about a historical feature. Li Ming invoked the agent, and the manager retrieved the requirement from two years ago along with an explanation. It was not perfect, but far better than blind searching.

System upgrade: Encouraged by the initial results, Li Ming outlined three key points:

Basic query: enable newcomers and product staff to quickly find standard answers for common business issues.

Knowledge association: connect code changes with requirement documents and incident records to build a requirement‑centric knowledge base.

Intelligent prompts: automatically suggest historical experience when a new requirement is being developed.

Practical application: While developing a new feature, Li Ming gathered related historical requirements, code, and operation records together. He found that this not only deepened his own understanding but also allowed new interns to get up to speed quickly.

3. Large Model Application Stage 1

This stage is a basic introduction: using simple prompts to ask the large model common work‑related questions.

4. Large Model Application Stage 2

4.1 Architecture Diagram

4.2 Technical Route

ps: This example uses DIFY (a large‑model workflow platform). For internal use, be aware of permission and security concerns and prefer your own internal large‑model platform.

4.3 Result Showcase – DMS Technical Expert Practice

4.3.1 Recommended Corpus

Essential: classic requirement TRD, ERD documentation.

ERD documents help the model quickly understand system structure and explain business knowledge.

TRD documents allow the model to provide technical opinions and answer system/technology questions.

System overview documents can supplement database design, system design, and business function sharing.

Recommended: R&D notes and common issues.

Technical experts can combine common‑issue docs with historical cases to prevent incidents.

Examples: (1) Historical online issues to avoid recurrence; (2) R&D/product Q/A docs to help quickly locate solutions.

Essential: DMS system PRD/requirement set – helps the model understand business and answer specific requirement questions.

Essential: Collection of common system pitfalls – e.g., pre‑warming before release, shared Redis risks, MQ traffic spikes.

4.3.2 Prompt Suggestions

Problem answering: provide accurate information for product managers and resolve questions from testers or developers unfamiliar with the system.

Solution guidance: explain system‑level issues and propose solutions for product teams and store‑ticket feedback.

System introduction: for any design question, combine ERD, TRD, etc., to explain database, system, or business flow.

Precautions: when R&D raises concerns, combine past cases and operational manuals to give professional advice.

4.3.3 Example

Use DIFY to create a knowledge base that links code snippets with requirement descriptions.

curl -H "Authorization: token YOUR_TOKEN" \
"https://api.github.com/repos/{owner}/{repo}/issues/{issue_number}"

Scenario: If a commit message contains an issue/PR number such as Fix #123, the above API returns JSON with pull_request or timeline_url fields that can be used to trace the related code.

curl -H "Authorization: token YOUR_TOKEN" \
"https://api.github.com/repos/{owner}/{repo}/commits/{commit_sha}"

Method 2 – Search API to directly search code:

curl -H "Authorization: token YOUR_TOKEN" \
"https://api.github.com/search/commits?q=repo:{owner}/{repo}+[REQ-123]+in:message"

Search code file content (requires GitHub Advanced Security):

curl -H "Authorization: token YOUR_TOKEN" \
"https://api.github.com/search/code?q=repo:{owner}/{repo}+REQ-123+in:file"

Retrieve code via Pull Request:

curl -H "Authorization: token YOUR_TOKEN" \
"https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}"

curl -H "Authorization: token YOUR_TOKEN" \
"https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}/files"

5. Large Model Application Stage 3

5.1 Architecture Diagram

5.2 Implementation Steps

5.2.1 Step 1 – Bind requirement name with code

Use Issue/PR numbers in commit messages (e.g., Fix #123) to retrieve associated code via GitHub APIs.

5.2.2 Step 2 – Clean and annotate data, upload to knowledge base

curl --location --request POST 'https://api.dify.ai/v1/datasets' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{"name": "name", "permission": "only_me"}'

curl --location --request POST 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{document_id}/segments' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{"segments": [{"content": "需求描述1的详细内容", "answer": "对应的代码实现1", "keywords": ["关键词1", "关键词2"]}, {"content": "需求描述2的详细内容", "answer": "对应的代码实现2", "keywords": ["关键词3", "关键词4"]}]}'

5.2.3 Step 3 – Configure workflow (illustrative diagram)

5.3 Result Display

5.3.1 Historical Change Retrieval

Combine the "transaction history change" knowledge base to retrieve changed code snippets.

5.3.2 Historical Change Analysis

For product managers who cannot read code, the system summarizes the impact of changes based on the knowledge base.

5.3.3 Code Generation from TRD

Example: class

com.jd.xstore.settlement.center.biz.service.CommonSettlementFacadeSaasImpl#calculateTotalPrice

– PRD specifies support for POS bean usage, querying member system for bean totals, calculating deductible amounts, and returning results.

5.3.4 Similar Past Designs

Identify required changes for a new SendPayParam type.

6. Summary

Stage 1 – Basic usage: developers generate code snippets with AI, testers write test cases, product managers draft requirement documents, improving efficiency.

Stage 2 – Knowledge integration: built a system‑level knowledge‑base template, developed intelligent retrieval that points to exact document locations, and encouraged departments to improve documentation.

Stage 3 – Deep application: code change tracing, requirement analysis for newcomers, AI‑assisted code generation, and experience inheritance provide implementation ideas for similar demands.

This incremental approach transforms fragmented knowledge into a systematic knowledge‑management mechanism, solving onboarding and knowledge‑transfer challenges while establishing a sustainable knowledge‑preservation process.

7. Future Optimizations

Identified issues for continuous improvement:

Code generation quality depends on the frequency of requirement changes; stable modules receive only basic code.

Accuracy of knowledge association needs enhancement; stricter linking of each code commit to a clear requirement document would improve precision.

RAG‑based generation heavily relies on accurate query recognition and retrieval recall.

Li Ming plans to incorporate these optimization points into the next development phase, believing that persistent effort will lead to success.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI RAG software development

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.