How RepoMaster Enables AI Agents to Master GitHub Repositories for Complex Tasks
RepoMaster is an AI‑driven framework that automatically discovers, analyzes, and executes code from massive GitHub repositories, turning them into reusable tools and achieving state‑of‑the‑art performance on challenging benchmarks while drastically reducing token consumption and engineering effort.
RepoMaster Core Framework
RepoMaster enables AI agents to treat a selected GitHub repository as a reusable tool. The workflow consists of three tightly coupled stages.
Stage 1 – Hierarchical Repository Analysis
Hierarchical Code Tree (HCT) : extracts package‑module‑class‑function hierarchy.
Function Call Graph (FCG) : records caller‑callee relationships.
Module Dependency Graph (MDG) : captures import‑level dependencies.
Repositories are scored on dependency density, code complexity and update frequency to prioritize core modules.
Stage 2 – Autonomous Exploration & Execution
Context‑aware code view: on‑demand inspection of any file, class or function.
Dependency tracing using FCG and MDG to follow call chains.
Keyword‑based code search within the repository.
Interactive feedback loop: the agent iteratively writes code, runs it, inspects logs and adjusts actions based on success or failure.
Stage 3 – Multi‑Level Information Filtering
Code reduction : extracts only the AST sub‑trees relevant to the current task.
Document reduction : splits large documentation into chunks and retrieves the most pertinent fragments.
Log reduction : keeps only the head and tail of execution logs that contain error messages.
Experimental Evaluation
Two benchmarks that require reuse of existing code were used.
MLE‑R : derived from OpenAI’s MLE‑Bench, focuses on machine‑learning tasks solvable inside real GitHub projects.
GitTaskBench : newly built suite covering tasks such as old‑photo restoration and speech denoising; introduces the Task Pass Rate metric to measure end‑to‑end delivery quality.
RepoMaster achieved a highest task success rate of 62.96 % (up from 40.74 % of the strongest baseline) and reduced token consumption to ≈57 % of that of SWE‑Agent.
Case Study – 3D Pose Estimation
In a 3‑D pose‑estimation task, baseline agents either failed due to blind trial‑and‑error or deviated from the core algorithm because they lacked a global view of the repository. RepoMaster’s hierarchical maps quickly identified the critical components, enabling efficient task completion.
Resources
Paper:
RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task SolvingArXiv PDF: https://arxiv.org/pdf/2505.21577 GitHub repository:
https://github.com/QuantaAlpha/RepoMasterData Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
