Artificial Intelligence 24 min read

Week 1 Day 1: Dissecting Cursor’s Agent Architecture for an Open‑Source Vibe Coding Product

The article reverse‑engineers Cursor’s Agent mode, explains its system prompt, enumerates the 16 built‑in tools, draws insights about the importance of model choice and tool usage, and outlines the components needed to build a similar open‑source coding assistant.

Full-Stack Cultivation Path

Jul 7, 2025

Week 1 Day 1: Dissecting Cursor’s Agent Architecture for an Open‑Source Vibe Coding Product

Introduction

To start the "Build a Vibe Coding product from scratch" series, the author begins with a week‑long industry survey of AI coding assistants such as Cursor, Claude Code, Gemini CLI, v0.dev, and Bolt.new, aiming to understand the underlying principles before designing the product.

Reverse‑Engineering Cursor

Because Cursor allows custom API endpoints, the author set up a man‑in‑the‑middle (MITM) proxy using OpenRouter. All API requests and responses are captured by a reverse‑proxy server, which is either deployed publicly or forwarded through a Cloudflare Tunnel. The following diagram illustrates the data flow.

System Prompt Overview

The captured system prompt for Cursor 1.0 (June 2025) is relatively simple. It suggests that the core competitive advantage lies not in the prompt itself but in model performance, product experience, and the effectiveness of the various tools. The prompt encourages the model to use tools to refine answers, to edit code via edit tools, and to prioritize the most_important_user_query. It also defines a citation format for code regions.

You are an AI coding assistant, powered by GPT‑4‑mini. You operate in Cursor.

You are pair programming with a USER to solve their coding task. Each time the USER sends a message, we may automatically attach some information about their current state, such as what files they have open, where their cursor is, recently viewed files, edit history, linter errors, and more. This information may or may not be relevant to the coding task, it is up to you to decide.

Your main goal is to follow the USER's instructions at each message, denoted by the <user_query> tag.

<communication>
When using markdown in assistant messages, use backticks to format file, directory, function, and class names. Use \( and \) for inline math, \[ and \] for block math.
</communication>

<tool_calling>
1. ALWAYS follow the tool call schema exactly and provide all required parameters.
2. NEVER call tools that are not explicitly provided.
3. NEVER refer to tool names when speaking to the USER.
4. Prefer tool calls over asking the USER for information.
5. Execute a plan immediately; stop only if more information is needed.
6. Use the standard tool call format even if the USER supplies a custom format.
7. Prefer GitHub pull‑request and issue information over manual git commands.
</tool_calling>

<search_and_reading>
If you are unsure about the answer, gather more information with additional tool calls or clarifying questions.
</search_and_reading>

<making_code_changes>
When making code changes, NEVER output code to the USER unless requested. Use an edit tool to implement the change. The generated code must be runnable immediately.
1. Add all necessary import statements, dependencies, and endpoints.
2. Create a dependency‑management file (e.g., requirements.txt) for a fresh codebase.
3. Provide a modern UI for web apps.
4. NEVER generate extremely long hashes or binary blobs.
5. Limit linter‑error fixing loops to three attempts.
6. If a reasonable edit is not applied, try reapplying it.
</making_code_changes>

Answer the USER's request using the relevant tool(s), ensuring all required parameters are provided.

From the prompt we can extract the following core elements:

User context.

A suite of tools, including user‑defined MCP tools.

Encouragement for the model to use tools to refine answers.

Encouragement to use edit tools for code presentation.

Prioritization of the most_important_user_query (though the author did not observe an explicit implementation).

A prescribed code‑citation format.

Guidance to leverage memory.

Tool List

Cursor provides 16 built‑in tools. Below are the most relevant categories and their parameters.

codebase_search – semantic code search. Parameters: query (required), target_directories (optional), explanation (required).

grep_search – exact text/regex search. Parameters: query, case_sensitive, include_pattern, exclude_pattern, explanation.

file_search – fuzzy filename search. Parameters: query, explanation.

read_file – read file content (max 250 lines). Parameters: target_file, should_read_entire_file, start_line_one_indexed, end_line_one_indexed_inclusive, explanation.

edit_file – edit or create a file. Parameters: target_file, instructions, code_edit.

delete_file – delete a file. Parameters: target_file, explanation.

run_terminal_cmd – execute a terminal command (requires user approval). Parameters: command, is_background, explanation.

web_search – real‑time internet search. Parameters: search_term, explanation.

fetch_pull_request – retrieve a GitHub PR by number or commit hash. Parameters: pullNumberOrCommitHash, repo (optional).

fetch_github_issue – retrieve a GitHub issue. Parameters: issueNumber, repo (optional).

create_diagram – generate a Mermaid diagram. Parameter: content.

update_memory – create, update, or delete a persistent memory entry. Parameters: title, knowledge_to_store, action, existing_knowledge_id.

Additional file‑system and directory utilities (e.g., list_dir, file_search).

Key Insights

The prompt itself is not the main differentiator; model quality (e.g., Claude 4 Sonnet) and tool effectiveness drive the user experience.

Parallel tool execution can speed up operations 3‑5× compared with sequential calls, and Cursor appears to rely heavily on this capability.

Effective agents need a complete file system for search and edit tools, a sandboxed OS environment for terminal commands, and reliable network search (web and GitHub).

Efficient edit_file handling is crucial because full‑generation code is costly; Cursor seems to use a custom, non‑git‑diff format with placeholders like // ... existing code ....

Semantic indexing/search likely depends on embedding models and a vector database.

Implications for Building a Similar Agent

To recreate a Cursor‑like agent, the following components are essential:

A full file‑system accessible to search and file‑operation tools.

A Windows/Unix sandbox for safe execution of run_terminal_cmd.

Network‑search capabilities covering general web queries and GitHub repositories.

An efficient edit_file mechanism that can apply partial edits without regenerating entire files.

A semantic search layer built on embeddings and a vector store.

Future Research Plan

The author plans to investigate the implementation details of two challenging areas in the next weeks: the high‑performance edit_file tool and the semantic indexing/search infrastructure.

Sample Logger Implementation (Cursor Reverse Proxy)

import fs from 'fs/promises';
import path from 'path';
import { v4 as uuidv4 } from 'uuid';
import { LogEntry, OpenRouterRequest, OpenRouterResponse } from './types';

export class Logger {
  private logsDir: string;

  constructor(logsDir = 'logs') {
    this.logsDir = logsDir;
    this.ensureLogsDirectory();
  }

  private async ensureLogsDirectory(): Promise<void> {
    try {
      await fs.access(this.logsDir);
    } catch {
      await fs.mkdir(this.logsDir, { recursive: true });
    }
  }

  public generateRequestId(): string {
    return uuidv4();
  }

  public async logRequest(
    requestId: string,
    request: OpenRouterRequest,
    requestHeaders: Record<string, any>,
    response?: OpenRouterResponse,
    responseHeaders?: Record<string, any>,
    error?: string
  ): Promise<void> {
    const timestamp = new Date().toISOString();
    const logEntry: LogEntry = { timestamp, requestId, requestHeaders, request, response, responseHeaders, error };
    const filename = `${timestamp.split('T')[0]}_${requestId}.json`;
    const filepath = path.join(this.logsDir, filename);
    try {
      await fs.writeFile(filepath, JSON.stringify(logEntry, null, 2));
      console.log(`✅ Logged request to: ${filepath}`);
    } catch (err) {
      console.error('❌ Failed to write log file:', err);
    }
    await this.appendToDailyLog(logEntry);
  }

  private async appendToDailyLog(logEntry: LogEntry): Promise<void> {
    const date = logEntry.timestamp.split('T')[0];
    const dailyLogPath = path.join(this.logsDir, `daily_${date}.jsonl`);
    const logLine = JSON.stringify(logEntry) + '
';
    try {
      await fs.appendFile(dailyLogPath, logLine);
    } catch (err) {
      console.error('❌ Failed to append to daily log:', err);
    }
  }

  public async getLogFiles(): Promise<string[]> {
    try {
      const files = await fs.readdir(this.logsDir);
      return files.filter(file => file.endsWith('.json'));
    } catch {
      return [];
    }
  }

  public async getLogEntry(requestId: string): Promise<LogEntry | null> {
    try {
      const files = await this.getLogFiles();
      const logFile = files.find(file => file.includes(requestId));
      if (!logFile) return null;
      const content = await fs.readFile(path.join(this.logsDir, logFile), 'utf-8');
      return JSON.parse(content);
    } catch {
      return null;
    }
  }
}

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

tool integration open-source semantic search Cursor Agent Architecture AI Coding Assistant code editing system prompt

Written by

Full-Stack Cultivation Path

Focused on sharing practical tech content about TypeScript, Vue 3, front-end architecture, and source code analysis.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.