Artificial Intelligence 6 min read

Can Self‑Iterating AI Agents Run on a Single GPU? Karpathy’s Autoresearch Demo

Karpathy’s open‑source “autoresearch” project demonstrates how a compact LLM training environment on a single GPU can let an AI agent autonomously modify code, run five‑minute training experiments, evaluate improvements, and iteratively produce better models, illustrating a new research paradigm where AI conducts experiments while humans design the system.

Machine Learning Algorithms & Natural Language Processing

Mar 9, 2026

Can Self‑Iterating AI Agents Run on a Single GPU? Karpathy’s Autoresearch Demo

Frontier AI research has shifted from human‑driven "paper‑and‑pencil" work to autonomous AI agents running on massive compute clusters.

"In the past, frontier AI research was done by "human computers" between meals, sleep, and entertainment, synchronising via group meetings. That era is gone. Today research is entirely the domain of self‑modifying AI agents running on sky‑high clusters. — Andrej Karpathy, March 2026"

Karpathy notes that AI self‑iteration is becoming mature, citing the FARS system that generates a paper roughly every two hours, producing 244 research hypotheses and 100 short papers during the Chinese New Year.

His latest open‑source weekend project, called autoresearch , provides a tiny yet functional LLM training environment on a single GPU. Humans only edit a program.md prompt file; the AI agent rewrites the Python training code, runs a five‑minute training cycle, checks whether performance improves, keeps the change if it does, or discards it otherwise, then repeats the process.

GitHub: https://github.com/karpathy/autoresearch

The training code is a simplified version of Karpathy’s earlier nanochat project, a minimal LLM training pipeline that fits in a few thousand lines and covers tokenizer training, pre‑training, instruction fine‑tuning, inference service, and a chat UI.

Because nanochat is lightweight, it can train a GPT‑2‑scale model on a single 8×H100 node in about two hours, roughly three hours faster than a month‑old baseline.

The core research paradigm illustrated by autoresearch is "AI does the experiments, humans design the research organization." This suggests future AI competition may focus less on model size or data and more on the quality of the "research‑organization code" that orchestrates experiments.

Each point in the diagram represents a complete five‑minute LLM training run, and the accumulated logs show how the agent progressively refines the model.