Karpathy Launches 'Auto-Research' Experiment: AI Writes Its Own Code and Trains Itself

Andrej Karpathy released the 630‑line open‑source autoresearch project that lets an AI agent automatically modify training code, run hundreds of short experiments on multiple GPUs, evaluate results, and iteratively improve models, raising questions about whether this constitutes genuine scientific research.

AI Engineering
AI Engineering
AI Engineering
Karpathy Launches 'Auto-Research' Experiment: AI Writes Its Own Code and Trains Itself

Overview

Andrej Karpathy open‑sourced autoresearch , a 630‑line Python project that automates the AI research loop so experiments can run unattended.

Automated research loop

Traditional loop: write code → run experiment → inspect → modify → repeat. Autoresearch replaces it with: write a program.md prompt describing the research direction → AI agent edits train.py → experiments run automatically (fixed 5‑minute budget) → results are evaluated → AI iterates.

Fixed 5‑minute budget forces the agent to balance model size, learning rate, and architecture, enabling roughly twelve experiments per hour.

Project structure

prepare.py

: infrastructure, immutable. train.py: contains GPT model, optimizer, training loop; the only file the AI modifies. program.md: human‑written instructions for the AI.

The AI works on a Git branch, committing each improvement.

Large‑scale demonstration

Running on eight NVIDIA H100 GPUs overnight, the system executed 276 experiments and produced 29 improvements. One discovered improvement was changing the random seed from the conventional 42 to 137.

Debate on research validity

Critics label the system a “controlled optimization loop” or metric‑driven hill‑climbing, arguing that the objective remains validation loss and can become trapped in local optima, thus not constituting autonomous scientific discovery.

Supporters argue that the fixed time window compels the AI to make genuine architectural trade‑offs instead of relying on brute‑force search.

Shift in research bottleneck

The bottleneck moves from “can we run experiments?” to “can we ask the right questions?” Researchers spend more effort crafting prompts that guide the AI, effectively becoming prompt engineers.

Ecosystem context

nanoGPT provides model training, nanoChat enables chatbot construction, and autoresearch closes the loop by automating the full research cycle, removing the need for a dedicated lab.

Platform support and licensing

Current support is limited to NVIDIA GPUs; a macOS fork exists. The project is released under the MIT license.

Project URL: https://github.com/karpathy/autoresearch

machine learningAI agentsGitHubAutoMLResearch AutomationNVIDIA GPUs
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.