How RepoMaster Enables AI Agents to Master GitHub Repositories for Complex Tasks

RepoMaster is an AI‑driven framework that automatically discovers, analyzes, and executes code from massive GitHub repositories, turning them into reusable tools and achieving state‑of‑the‑art performance on challenging benchmarks while drastically reducing token consumption and engineering effort.

Data Party THU
Data Party THU
Data Party THU
How RepoMaster Enables AI Agents to Master GitHub Repositories for Complex Tasks

RepoMaster Core Framework

RepoMaster enables AI agents to treat a selected GitHub repository as a reusable tool. The workflow consists of three tightly coupled stages.

Stage 1 – Hierarchical Repository Analysis

Hierarchical Code Tree (HCT) : extracts package‑module‑class‑function hierarchy.

Function Call Graph (FCG) : records caller‑callee relationships.

Module Dependency Graph (MDG) : captures import‑level dependencies.

Repositories are scored on dependency density, code complexity and update frequency to prioritize core modules.

Stage 2 – Autonomous Exploration & Execution

Context‑aware code view: on‑demand inspection of any file, class or function.

Dependency tracing using FCG and MDG to follow call chains.

Keyword‑based code search within the repository.

Interactive feedback loop: the agent iteratively writes code, runs it, inspects logs and adjusts actions based on success or failure.

Stage 3 – Multi‑Level Information Filtering

Code reduction : extracts only the AST sub‑trees relevant to the current task.

Document reduction : splits large documentation into chunks and retrieves the most pertinent fragments.

Log reduction : keeps only the head and tail of execution logs that contain error messages.

Experimental Evaluation

Two benchmarks that require reuse of existing code were used.

MLE‑R : derived from OpenAI’s MLE‑Bench, focuses on machine‑learning tasks solvable inside real GitHub projects.

GitTaskBench : newly built suite covering tasks such as old‑photo restoration and speech denoising; introduces the Task Pass Rate metric to measure end‑to‑end delivery quality.

RepoMaster achieved a highest task success rate of 62.96 % (up from 40.74 % of the strongest baseline) and reduced token consumption to ≈57 % of that of SWE‑Agent.

Case Study – 3D Pose Estimation

In a 3‑D pose‑estimation task, baseline agents either failed due to blind trial‑and‑error or deviated from the core algorithm because they lacked a global view of the repository. RepoMaster’s hierarchical maps quickly identified the critical components, enabling efficient task completion.

Resources

Paper:

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

ArXiv PDF: https://arxiv.org/pdf/2505.21577 GitHub repository:

https://github.com/QuantaAlpha/RepoMaster
AI agentslarge language modelssoftware engineeringautonomous executioncode repository analysisRepoMaster
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.