32 min read

Build a Code‑Analysis Assistant Step‑by‑Step: From LLM Calls to Production‑Ready Agent

This guide walks through building a production‑grade code‑analysis assistant, detailing requirements, architecture, a Node.js tech stack, TF‑IDF RAG implementation, dynamic skill loading, secure tool calls, memory handling, observability, common pitfalls, and paths to scale from a demo to a full‑featured system.

CodeNotes

May 3, 2026

Build a Code‑Analysis Assistant Step‑by‑Step: From LLM Calls to Production‑Ready Agent

Most Agent tutorials stop at "calling an API"; this guide walks through a complete pipeline: RAG knowledge injection, dynamic Skill loading, multi‑step Tool Calls, conversation memory, safety guards, and observability, with real code and practical notes.

1. Core requirements for a code‑analysis assistant

A useful assistant must:

Read code : traverse project files and locate key logic.

Answer with standards : give concrete suggestions based on team conventions (naming, framework versions, architecture decisions).

Support multi‑turn follow‑up : retain context for questions like "What is the call chain of that function?".

Be trustworthy and controllable : never fabricate files, never access paths outside the project, and avoid infinite loops.

These map to four core modules: Tool (utility), RAG + Skill (knowledge), Memory (conversation), and Guard (safety).

┌────────────────────────────────────────────────────────────────┐
│                Code‑Analysis Assistant Architecture          │
│                                                                │
│  User Input                                                    │
│      │                                                         │
│      ├─→ [Guard]   Input validation / privilege check          │
│      ├─→ [RAG]     Keyword routing → knowledge search → inject│
│      ├─→ [Skill]   Domain detection → expert guide System Prompt│
│      ├─→ [Memory]  History → splice into messages[]           │
│      └─→ [Agent Loop]                                         │
│                LLM reasoning                                   │
│                ↓ finish_reason == tool_calls?                │
│                Tool execution (read / search / write)          │
│                ↓ append result to messages[]                  │
│                LLM re‑reason (max N rounds)                  │
│                ↓ stop                                         │
│                Final answer → update Memory → return to user   │
└────────────────────────────────────────────────────────────────┘

2. Technology selection and project skeleton

Runtime : Node.js ESM (native top‑level await, no TypeScript compilation).

LLM : OpenAI SDK (compatible with GPT‑4o, DeepSeek, Tongyi, Claude).

RAG : Hand‑written TF‑IDF (zero external dependencies; replace with embeddings for production).

Tool : MCP‑style schema (compatible with Claude MCP ecosystem).

Dependencies : only openai + dotenv.

Directory layout:

simple-agent/
├── index.js          # Agent entry (main loop, RAG/Skill loading)
├── tools.js          # Tool schema + executor (MCP style)
├── rag.js            # TF‑IDF RAG engine
├── knowledge/        # Knowledge base (Markdown, domain‑split)
│   ├── react.md
│   ├── database.md
│   └── ...
├── skills/           # Skill expert guides (Markdown)
│   ├── react.md
│   └── ...
└── .env              # API key / base URL / model

3. RAG – injecting domain knowledge

3.1 Why RAG?

LLM training data has a cutoff date and lacks internal team conventions. RAG inserts relevant private documents into the prompt before inference, preventing generic answers.

3.2 TF‑IDF retrieval principle

TF‑IDF + cosine similarity provides lightweight retrieval. Core formulas are implemented in rag.js:

// rag.js – tokenization (mixed Chinese/English)
function tokenize(text) {
  return text.toLowerCase().match(/[a-z0-9_]+|[\u4e00-\u9fa5]+/g) || [];
}

// TF‑IDF vector
_tfidf(text, corpus) {
  const tokens = tokenize(text);
  const tf = {};
  tokens.forEach(t => tf[t] = (tf[t] || 0) + 1 / tokens.length);

  // Single‑document case: IDF all zero → fallback to pure TF
  if (corpus.length === 1) return tf;

  const vec = {};
  Object.entries(tf).forEach(([term, freq]) => {
    const df = corpus.filter(doc => doc.toLowerCase().includes(term)).length;
    vec[term] = freq * Math.log((corpus.length + 1) / (df + 1));
  });
  return vec;
}

// Cosine similarity
_cosine(v1, v2) {
  let dot = 0, m1 = 0, m2 = 0;
  const keys = new Set([...Object.keys(v1), ...Object.keys(v2)]);
  keys.forEach(k => {
    dot += (v1[k] || 0) * (v2[k] || 0);
    m1  += (v1[k] || 0) ** 2;
    m2  += (v2[k] || 0) ** 2;
  });
  return dot / (Math.sqrt(m1) * Math.sqrt(m2) || 1);
}

3.3 Keyword routing – load on demand

Injecting the entire knowledge base causes token explosion and attention dilution. Keyword routing loads only hit documents:

// Knowledge registry
const RAG_REGISTRY = [
  {
    name: 'react',
    file: 'react.md',
    keywords: ['react','jsx','组件','hooks','usestate','useeffect','前端','nextjs','vite','重渲染','受控组件']
  },
  {
    name: 'database',
    file: 'database.md',
    keywords: ['数据库','mysql','sql','索引','事务','慢查询','redis']
  },
  // ... more domains
];

function buildRAG(query) {
  const q = query.toLowerCase();
  const instance = new RAG();
  for (const entry of RAG_REGISTRY) {
    const hit = entry.always || entry.keywords?.some(kw => q.includes(kw));
    if (hit) {
      instance.add(entry.name, fs.readFileSync(path.join(knowledgeDir, entry.file), 'utf-8'));
    }
  }
  return instance;
}

Practical tip: keyword lists should cover synonyms and common spelling variants (e.g., useEffect and use-effect ).

3.4 No knowledge hit → explicit refusal

// Completely missed any knowledge base
if (ragLoaded.length === 0) {
  return 'Sorry, I haven’t learned the relevant knowledge yet.';
}

This guard prevents hallucination when the LLM lacks domain knowledge.

4. Skill – injecting behavior guidelines

RAG provides "what the knowledge is"; Skill provides "what to do". Skill files are expert prompts stored as Markdown per domain.

function loadSkills(query) {
  const q = query.toLowerCase();
  let content = '';
  for (const skill of SKILL_REGISTRY) {
    const hit = skill.keywords?.some(kw => q.includes(kw));
    if (hit) {
      content += fs.readFileSync(path.join(skillsDir, skill.file), 'utf-8') + '

';
    }
  }
  return content.trim();
}

Example skills/react.md:

## React Code Guidelines
- Always use function components + Hooks; prohibit class components
- Side effects must be handled in useEffect and return a cleanup function
- List rendering must have stable keys; never use array index as key
- Async requests must handle loading / error states
- Component filenames use PascalCase; hook files use camelCase prefixed with "use"

Best practice: keep Skill files concise (< 500 words) to maintain instruction weight.

5. Tool Call – letting the LLM read code

5.1 Design principles

Precise description : the description field is the only basis for tool selection.

Constrained parameters : add enum and required wherever possible.

Single responsibility : each tool does one thing.

// tools.js – OpenAI function‑calling schema
export const tools = [
  {
    type: 'function',
    function: {
      name: 'read_file',
      description: 'Read the full content of a specified file. Suitable for source code, config files, documentation.',
      parameters: {
        type: 'object',
        properties: {
          file_path: { type: 'string', description: 'Path relative to project root, e.g., "src/index.js" or "knowledge/react.md"' }
        },
        required: ['file_path']
      }
    }
  },
  {
    type: 'function',
    function: {
      name: 'list_directory',
      description: 'List all files and sub‑directories. Use this first to understand project structure before deciding which file to read.',
      parameters: {
        type: 'object',
        properties: {
          dir_path: { type: 'string', description: 'Path relative to project root, default "."', default: '.' }
        }
      }
    }
  },
  {
    type: 'function',
    function: {
      name: 'search_files',
      description: 'Search for a keyword in .js/.md/.txt files, returning matching lines. Useful for locating function definitions or variable usage.',
      parameters: {
        type: 'object',
        properties: {
          keyword: { type: 'string', description: 'Keyword to search (case‑insensitive)' },
          dir_path: { type: 'string', description: 'Search directory, default "."', default: '.' }
        },
        required: ['keyword']
      }
    }
  },
  {
    type: 'function',
    function: {
      name: 'write_file',
      description: 'Write content to a file, used for saving analysis reports or generated code.',
      parameters: {
        type: 'object',
        properties: {
          file_path: { type: 'string', description: 'Target file path' },
          content: { type: 'string', description: 'Full content to write' },
          overwrite: { type: 'boolean', description: 'Whether to overwrite existing file', default: true }
        },
        required: ['file_path', 'content']
      }
    }
  }
];

5.2 Safety – path‑traversal defense

function safePath(inputPath) {
  const resolved = path.resolve(ROOT, inputPath);
  // Ensure the resolved absolute path stays inside the project root
  if (!resolved.startsWith(ROOT + path.sep) && resolved !== ROOT) {
    throw new Error(`Security intercept: access outside project root prohibited: ${inputPath}`);
  }
  return resolved;
}

Extension: in production also add tool‑call audit logs (record tool name, parameters, result summary) for anomaly investigation.

5.3 Tool executor

export async function executeTool(name, args) {
  try {
    switch (name) {
      case 'read_file': {
        const filePath = safePath(args.file_path);
        return fs.readFileSync(filePath, 'utf-8');
      }
      case 'write_file': {
        const filePath = safePath(args.file_path);
        if (!args.overwrite && fs.existsSync(filePath)) {
          return `File exists and overwrite=false: ${args.file_path}`;
        }
        fs.mkdirSync(path.dirname(filePath), { recursive: true });
        fs.writeFileSync(filePath, String(args.content), 'utf-8');
        return `Write successful: ${args.file_path}`;
      }
      case 'list_directory': {
        const dirPath = safePath(args.dir_path || '.');
        return fs.readdirSync(dirPath, { withFileTypes: true })
          .filter(e => e.name !== 'node_modules' && !e.name.startsWith('.'))
          .map(e => `${e.isDirectory() ? '[dir]' : '[file]'} ${e.name}`)
          .join('
');
      }
      case 'search_files': {
        const results = [];
        collectMatches(safePath(args.dir_path || '.'), args.keyword, results);
        return results.length > 0 ? results.join('
') : `No matches for "${args.keyword}"`;
      }
      default:
        return `Unknown tool: ${name}`;
    }
  } catch (err) {
    return `Tool execution error: ${err.message}`; // Return error to LLM for retry
  }
}

6. Agent main loop – choreography of LLM and tools

6.1 Full flow (pseudocode)

function runAgent(query, history):
  # --- preparation ---
  ragDocs   = RAG.search(query)          # knowledge retrieval
  skillText = Skill.load(query)          # expert guide
  prompt    = buildSystemPrompt(ragDocs, skillText)
  messages  = [system(prompt), ...history, user(query)]

  # --- Agent loop ---
  round = 0
  while round < MAX_ROUNDS:
    round += 1
    response = LLM.complete(messages, tools=TOOLS, stream=true)

    if response.finish_reason == "stop":
      break  # LLM thinks it is done

    if response.finish_reason == "tool_calls":
      messages.append(response.message)  # assistant message with tool intents
      for toolCall in response.tool_calls:
        result = Tool.execute(toolCall.name, toolCall.args)
        messages.append(tool_result(toolCall.id, result))
      continue

  # --- finalization ---
  answer = extract_last_assistant_content(messages)
  history.append(user(query), assistant(answer))
  return answer

6.2 Real implementation (Node.js)

async function runAgent(query, history) {
  const messages = buildMessages(query, history);

  let round = 0;
  while (true) {
    if (++round > MAX_TOOL_ROUNDS) {
      console.warn(`[Warning] Tool calls exceeded limit ${MAX_TOOL_ROUNDS}, forcing termination`);
      break;
    }

    // ── streaming inference ──────────────────────
    let content = '';
    const toolCallsMap = {};
    let finishReason = null;

    const stream = await llm.chat.completions.create({
      model: process.env.MODEL || 'gpt-4o-mini',
      messages,
      tools,
      tool_choice: 'auto',
      stream: true,
    });

    for await (const chunk of stream) {
      const delta = chunk.choices[0]?.delta;
      finishReason = chunk.choices[0]?.finish_reason ?? finishReason;

      // streaming text output
      if (delta?.content) {
        process.stdout.write(delta.content);
        content += delta.content;
      }

      // incremental tool‑call argument collection (streamed args arrive in pieces)
      if (delta?.tool_calls) {
        for (const tc of delta.tool_calls) {
          const slot = toolCallsMap[tc.index] ?? { id: tc.id, name: '', arguments: '' };
          if (tc.id) slot.id = tc.id;
          if (tc.function?.name) slot.name += tc.function.name;
          if (tc.function?.arguments) slot.arguments += tc.function.arguments;
          toolCallsMap[tc.index] = slot;
        }
      }
    }

    const toolCalls = Object.values(toolCallsMap);

    // Append this round's assistant message
    messages.push({
      role: 'assistant',
      content: content || null,
      ...(toolCalls.length && { tool_calls: toolCalls.map(tc => ({ id: tc.id, type: 'function', function: { name: tc.name, arguments: tc.arguments } })) })
    });

    // Exit condition: LLM no longer requests tools
    if (finishReason !== 'tool_calls' || toolCalls.length === 0) break;

    // ── execute tools, feed results back ────────
    for (const tc of toolCalls) {
      let args;
      try { args = JSON.parse(tc.arguments); }
      catch { messages.push({ role: 'tool', tool_call_id: tc.id, content: 'Parameter JSON parse error' }); continue; }

      console.log(`
[Tool] Call: ${tc.name}(${JSON.stringify(args)})`);
      const result = await executeTool(tc.name, args);
      const preview = result.length > 200 ? result.slice(0, 200) + '…' : result;
      console.log(`[Tool] Return: ${preview}`);

      messages.push({ role: 'tool', tool_call_id: tc.id, content: result });
    }
  }

  // Write back history
  const answer = messages.findLast(m => m.role === 'assistant')?.content ?? '';
  history.push({ role: 'user', content: query });
  if (answer) history.push({ role: 'assistant', content: answer });
  return answer;
}

6.3 Typical multi‑step execution

Query: "Analyze the main modules in index.js". The LLM performs three tool calls:

Round 1: list_directory(".") → returns project structure
Round 2: read_file("index.js") → returns file content
Round 3: search_files("RAG_REGISTRY") → returns the line where the registry is defined
Round 4: LLM decides information is sufficient and outputs the analysis report (stop)

7. Conversation memory – foundation for multi‑turn queries

7.1 Simple array implementation

const history = []; // session‑level memory, declared outside the main loop

// Store only user and assistant text per round
history.push({ role: 'user',      content: query });
history.push({ role: 'assistant', content: answer });

// Next round injects full history
const messages = [
  { role: 'system', content: systemPrompt },
  ...history,
  { role: 'user',  content: newQuery },
];

7.2 Edge cases

Problem 1: Context length overflow – as rounds increase, history may exceed the model’s context window. A sliding‑window truncation removes the earliest user/assistant pairs until token count fits.

function trimHistory(history, maxTokens = 6000) {
  let total = estimateTokens(history);
  while (total > maxTokens && history.length > 2) {
    history.splice(0, 2); // drop earliest pair
    total = estimateTokens(history);
  }
  return history;
}

Problem 2: Per‑round System Prompt changes – each query may retrieve different RAG docs, so the System Prompt varies intentionally.

Problem 3: Concurrent users – a single‑user CLI can use a global history array. For multi‑user services, isolate by session ID:

const sessions = new Map(); // sessionId → history[]

function getOrCreateSession(sessionId) {
  if (!sessions.has(sessionId)) {
    sessions.set(sessionId, []);
  }
  return sessions.get(sessionId);
}

8. Observability – tracing Agent behavior

8.1 Structured logging (pseudo‑code)

// Log each tool call with full context
function logToolCall({ round, name, args, result, durationMs }) {
  const entry = {
    timestamp: new Date().toISOString(),
    round,
    tool: name,
    input: args,
    outputPreview: result.slice(0, 200),
    durationMs,
  };
  fs.appendFileSync('agent.log', JSON.stringify(entry) + '
');
}

// Log each LLM round token consumption
function logLLMRound({ round, model, promptTokens, completionTokens, finishReason }) {
  // implementation omitted
}

8.2 Key metrics

Tool call count / round – detects infinite loops.

Tokens per round – controls cost and reveals prompt bloat.

RAG hit rate – evaluates knowledge‑base coverage.

Tool error rate – uncovers unclear tool descriptions.

Average response latency – end‑to‑end performance benchmark.

8.3 Debugging tip

if (process.env.DEBUG === '1') {
  console.log('[DEBUG] messages:', JSON.stringify(messages, null, 2));
}

9. Full run example

# Install dependencies
npm install

# Configure environment variables
cat > .env << EOF
OPENAI_API_KEY=your_key_here
OPENAI_BASE_URL=https://api.openai.com/v1
MODEL=gpt-4o-mini
EOF

npm start

Sample interaction:

Enter question: Analyze the RAG implementation in this project – any improvements?

═══════════════════════════════════════════
User: Analyze the RAG implementation in this project – any improvements?
═══════════════════════════════════════════
[RAG] No knowledge hit (non‑technical question handled directly by LLM)
[LLM] Round 1 reasoning...

[Tool] Call: list_directory({"dir_path":"."})
[Tool] Return: [file] index.js  [file] rag.js  [dir] knowledge ...

[LLM] Round 2 reasoning...
[Tool] Call: read_file({"file_path":"rag.js"})
[Tool] Return: (full rag.js content)

[LLM] Round 3 reasoning...
Assistant: Current RAG uses TF‑IDF + cosine similarity; improvement suggestions:

1. **Retrieval accuracy**: TF‑IDF is a bag‑of‑words model and cannot capture semantic similarity.
   → Upgrade to OpenAI Embedding + vector DB (Chroma / Qdrant).
2. **Knowledge base updates**: Currently static; changes require restart.
   → Add file‑watcher for incremental index updates.
3. **Token control for many results**: topK is fixed at 2, lacking dynamic trimming.
   → Filter by similarity threshold and truncate by token budget.

10. Design summary – trade‑offs behind each decision

RAG retrieval : TF‑IDF – zero dependencies, instantly runnable; upgrade to embeddings when precision matters.

Knowledge loading : keyword routing – avoids diluting the prompt with irrelevant content.

Skill : separate Markdown files – decouples facts (RAG) from conventions (Skill).

No knowledge hit : explicit refusal – safer than hallucinated answers.

Tool safety : safePath validation – prevents path‑traversal; production should also sandbox.

Conversation memory : simple array storing only text rounds – keeps history short, excludes intermediate tool messages.

Tool‑call limit : MAX_TOOL_ROUNDS = 8 – prevents infinite tool loops.

Streaming output : stream: true – better user experience and enables incremental tool‑call handling.

11. Extension paths – from demo to production

Level 1 – improve retrieval quality (highest ROI)

TF‑IDF
  ↓
OpenAI text‑embedding‑3‑small + local hnswlib‑node index
  ↓
Embedding + Chroma / Qdrant (metadata filtering, persistence)

Embedding replacement (pseudo‑code):

// Replace TF‑IDF / cosine with embeddings
async embedText(text) {
  const res = await openai.embeddings.create({ model: 'text-embedding-3-small', input: text });
  return res.data[0].embedding; // 1536‑dim vector
}

async search(query, topK = 3) {
  const qVec = await this.embedText(query);
  return this.docs
    .map(doc => ({ ...doc, score: cosineSim(qVec, doc.embedding) }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);
}

Level 2 – integrate standard MCP ecosystem

// Replace hand‑written tools.js with @modelcontextprotocol/sdk
import { MCPClient } from '@modelcontextprotocol/sdk';

const mcpClient = new MCPClient({ transport: 'stdio' });
const mcpTools  = await mcpClient.listTools(); // auto‑discover all tools

// In the Agent loop, call mcpClient.callTool() instead of executeTool()

Level 3 – persistent memory

// Simple JSON persistence
function saveHistory(sessionId, history) {
  fs.writeFileSync(`sessions/${sessionId}.json`, JSON.stringify(history));
}

// Advanced SQLite persistence (better‑sqlite3)
const db = new Database('agent.db');
db.exec(`CREATE TABLE IF NOT EXISTS messages (session_id TEXT, role TEXT, content TEXT, ts INTEGER)`);

Level 4 – streaming Web API (SSE)

// Express + SSE pseudo‑code
app.get('/chat', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');

  const stream = await llm.chat.completions.create({ ..., stream: true });
  for await (const chunk of stream) {
    const text = chunk.choices[0]?.delta?.content;
    if (text) res.write(`data: ${JSON.stringify({ text })}

`);
  }
  res.write('data: [DONE]

');
  res.end();
});

Level 5 – multi‑Agent collaboration

When a single Agent cannot handle a complex task, introduce an Orchestrator + Sub‑Agent pattern:

Orchestrator Agent
  ├── Analyze requirement, split into sub‑tasks
  ├── Dispatch CodeAnalyzer Agent (focuses on code analysis)
  ├── Dispatch DocWriter Agent (focuses on documentation)
  └── Merge sub‑task results and return final answer

12. Common pitfalls and solutions

Tool‑call infinite loop : set MAX_TOOL_ROUNDS and terminate when exceeded.

Tool parameter JSON parse failure : wrap JSON.parse in try/catch and return the error message to the LLM for retry.

LLM ignores tool : improve the tool description or set tool_choice: "required" to force tool usage.

Prompt too long : apply sliding‑window truncation of history and token‑budgeted RAG result trimming.

Path‑traversal attack : enforce safePath validation and keep audit logs.

Streaming tool‑call order mismatch : use tc.index as slot index, not tc.id.

Chinese tokenization inaccurate : regex that matches consecutive Chinese characters [\u4e00-\u9fa5]+.

Conclusion

A truly usable code‑analysis assistant requires the combined stack:

RAG – inject real documents to avoid hallucination.

Skill – embed team conventions so the assistant is domain‑specific.

Tool – give the LLM autonomous code‑exploration capability.

Guard – safety checks at every critical point ensure controllability.

Memory – enable coherent multi‑turn conversations.

Observability – make Agent behavior traceable and debuggable.

The core architecture remains constant from demo to production; each layer can be deepened (embeddings, MCP integration, persistent memory, streaming APIs, multi‑Agent orchestration) as needed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Prompt Engineering RAG code-analysis Node.js Agent tool calling

Written by

CodeNotes

Discuss code and AI, and document daily life and personal growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1. Core requirements for a code‑analysis assistant

2. Technology selection and project skeleton

3. RAG – injecting domain knowledge

3.1 Why RAG?

3.2 TF‑IDF retrieval principle

3.3 Keyword routing – load on demand

3.4 No knowledge hit → explicit refusal

4. Skill – injecting behavior guidelines

5. Tool Call – letting the LLM read code

5.1 Design principles

5.2 Safety – path‑traversal defense

5.3 Tool executor

6. Agent main loop – choreography of LLM and tools

6.1 Full flow (pseudocode)

6.2 Real implementation (Node.js)

6.3 Typical multi‑step execution

7. Conversation memory – foundation for multi‑turn queries

7.1 Simple array implementation

7.2 Edge cases

8. Observability – tracing Agent behavior

8.1 Structured logging (pseudo‑code)

8.2 Key metrics

8.3 Debugging tip

9. Full run example

10. Design summary – trade‑offs behind each decision

11. Extension paths – from demo to production

Level 1 – improve retrieval quality (highest ROI)

Level 2 – integrate standard MCP ecosystem

Level 3 – persistent memory

Level 4 – streaming Web API (SSE)

Level 5 – multi‑Agent collaboration

12. Common pitfalls and solutions

Conclusion

CodeNotes

How this landed with the community

Was this worth your time?

0 Comments

Level 1 – improve retrieval quality (highest ROI)

Level 2 – integrate standard MCP ecosystem

Level 3 – persistent memory

Level 4 – streaming Web API (SSE)

Level 5 – multi‑Agent collaboration