Programming Agents Achieve 99% Success on Real‑World Robot Experiments
NVIDIA's ENPIRE project equips eight Codex agents with GPU and token budgets to autonomously run a closed‑loop research pipeline on real robots, revealing a physical scaling law, introducing MRU/MTU metrics, and reaching 99% success on complex dexterous tasks.
Automation research moves out of the code sandbox and into the physical world with ENPIRE, a project from NVIDIA GEAR lab led by Jim Fan that is the first to achieve fully automated research on robot hardware.
Closed‑loop autonomous pipeline
Eight Codex agents are deployed on a fleet of robots, each allocated GPU compute and a token budget, and given a simple objective: solve tasks quickly, keep robots busy safely, and avoid wasting compute. The agents run an autonomous loop that automatically resets environments, searches literature, implements ideas, builds infrastructure, trains and deploys policies, self‑validates, analyses logs, rewrites code, and iterates until reliable high‑precision execution of tasks such as zip‑tie fastening, pin‑box sorting, and GPU installation.
Physical scaling law
The team observes that increasing the number of parallel robots dramatically accelerates task solving; for example, expanding from a single robot to eight reduces the time needed for the pin‑box task from over 1.5 hours to about 40 minutes.
ENPIRE architecture
ENPIRE consists of four core modules that form a repeatable physical feedback loop:
Environment (EN) : automatic reset and verification of the physical setup.
Policy Improvement (PI) : launches policy optimisation.
Rollout (R) : supports single‑ or multi‑robot parallel evaluation of policies.
Evolution (E) : agents analyse logs, consult papers, improve training infrastructure and algorithm code to address failure modes.
This closed‑loop turns real‑world robot learning into a controllable optimisation process, minimising human effort while enabling fair ablation across training recipes and agent variants.
Performance results
Using ENPIRE, frontier programming agents achieve a 99 % success rate on challenging real‑world dexterous tasks such as zip‑tie fastening, pin‑box organization, and GPU installation.
Key finding: environment reset is easier than task completion
The team notes that for many robot tasks, resetting the environment is simpler than completing the task itself. ENPIRE therefore first has agents construct automatic reset environments—often simple pick‑and‑place problems solved by Cap‑X—then write heuristic reward functions, sandbox the environment, and launch automated research around the resulting score.
Relation to automated research definition
Following Karpathy’s definition, agents explore the internet for new paradigms and rewrite everything that could improve performance, including algorithms, training objectives, and data loaders. In the pin‑box task, an agent authored a contact‑force safety controller that outperformed manual reinforcement‑learning parameter tuning.
New metrics: MRU and MTU
Two metrics are introduced to quantify efficiency:
Mean Robot Utilization (MRU): proportion of real time robots spend running experiments.
Mean Token Utilization (MTU): efficiency of converting tokens into research progress.
In the experiments MRU stays below 50 %, meaning robots are idle half the time while awaiting agent decisions; thus better harnessing and faster models directly translate into tangible gains.
PushT benchmark solved without learning
Traditionally, the PushT benchmark requires extensive human demonstrations and hours of behavior‑cloning training. ENPIRE agents (Codex, Claude Code, Kimi Code) solved it in under two hours using a rule‑based heuristic approach, without neural networks, training, or human data.
Open‑source future
ENPIRE will be fully open‑source, allowing developers to build similar autonomous robot research systems at home using the LeRobotHF SO‑101 kit together with an NVIDIA Jetson Thor, which can also execute the PushT task.
Broader insight
The work suggests that robot research is evolving into environment‑design for coding agents, while algorithmic work moves to constructing self‑closing feedback loops; skills learned by agents bootstrap new capabilities, creating a compounding effect.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
