Artificial Intelligence 24 min read

Turning AI Agents into a Planning‑First Development Engine

The article analyzes Matt Van Horn’s workflow that replaces traditional IDEs with a planning‑first approach using markdown task files, voice input, parallel research agents, and a custom "last30days" engine, showing how AI can orchestrate research, planning, execution, and verification to boost productivity across code and non‑code tasks.

Architect

Mar 27, 2026

Turning AI Agents into a Planning‑First Development Engine

Background

Matt Van Horn argues that competitive advantage in AI‑assisted programming comes from embedding context, experience, and default processes into the development workflow rather than relying on ever‑better code‑generation models. He recommends a planning‑first approach using a plain‑text plan.md file and voice input instead of a traditional IDE.

Planning‑First Workflow

The workflow consists of three stages: research, planning, and execution. When a new idea appears, the user runs the command /ce:plan. This command feeds raw, noisy inputs (screenshots, chat logs, bug reports, spoken fragments, etc.) into a planning pipeline immediately, without waiting for a fully structured task.

The role of plan.md

plan.md

is a structured artifact that records:

The problem to solve.

The rationale for the chosen solution.

The files to modify and the existing patterns to follow.

Acceptance criteria and verification steps.

Only after this plan exists does any code get generated. The AI’s scarce resource becomes precise task boundaries and context rather than raw coding ability.

Research agent /ce:plan

The /ce:plan command launches multiple research agents in parallel. They scan the codebase, internal documentation, historical bugs, and external sources, then return a structured plan that answers the four questions above.

Fetching fresh community knowledge – /last30days

Before planning, the /last30days command pulls the latest community discussions from Reddit, X, YouTube, Hacker News and other platforms. The open‑source repository https://github.com/mvanhorn/last30days-skill (v2.9.5) implements this engine with:

Parallel fetching via ThreadPoolExecutor.

Separate modules under scripts/ for each source (e.g., openai_reddit.py, bird_x.py, hackernews.py).

Multi‑signal scoring that combines text similarity, interaction speed, source authority, and cross‑platform convergence.

Dedupe logic to remove near‑duplicate results.

455+ automated tests for reliability.

The search runs in two phases: Phase 1 gathers a broad result set; Phase 2 enriches entities with targeted queries. Flags --quick and --deep control the depth of each phase.

Voice as high‑bandwidth input

Voice tools such as Monologue or WhisperFlow let users dictate ideas without perfect phrasing. The system uses context to fill gaps, turning low‑quality speech into structured tasks and reducing the cost of task initiation.

Parallel windows for state separation

Matt runs 4–6 concurrent Ghostty sessions, each dedicated to a distinct state: research, planning, execution, or bug verification. This prevents idle waiting and keeps token usage focused on the current state rather than a single monolithic conversation.

Safety‑oriented configurations

bypassPermissions

disables confirmation dialogs, enabling high‑throughput but requiring strong repository discipline, testing, and rollback mechanisms.

Audible notifications signal window completion.

Zed’s 500 ms auto‑save ensures file changes are instantly visible to the AI session, creating a Google‑Docs‑like collaborative feel.

Extending planning beyond code

The same pipeline can ingest meeting recordings, strategic discussions, travel plans, or product proposals. Examples include converting a 90‑minute lunch conversation into a product proposal and generating a travel itinerary that is later deployed as a web page and pushed to Telegram.

Model division: Claude for planning, Codex for execution

Because parallel Opus sessions quickly consume Claude Max quota, the --codex flag offloads heavy implementation work to Codex. Claude excels at long‑chain planning and context orchestration; Codex handles bulk code generation.

Key takeaways

Always start with a structured plan.md before any implementation.

Feed the latest external information together with internal context to the planner.

Treat research, planning, execution, and verification as separate states rather than a single monolithic session.

Implementation details for /last30days

The repository’s entry point is scripts/last30days.py, which uses ThreadPoolExecutor to query dozens of sources concurrently. Supporting modules under scripts/lib/ include: openai_reddit.py – Reddit search via OpenAI Responses API. bird_x.py and xai_x.py – X search via free GraphQL and paid xAI endpoints. reddit_enrich.py – Enriches results with Reddit JSON interaction data. score.py – Multi‑signal weighted scoring. dedupe.py – Near‑duplicate removal. hackernews.py – Hacker News API integration.

Results are saved under ~/Documents/Last30Days/ as .md files named by topic, building a local research library that can be referenced in later /ce:plan runs.

Parallel state windows example

Typical window allocation:

Window 1 – runs /last30days for market research.

Window 2 – runs /ce:plan to produce a structured plan.

Window 3 – runs /ce:work to execute the plan with Codex.

Window 4 – monitors test failures and creates new bug‑fix plans.

Remote triggering

Plans can be initiated from a mobile device via Telegram, e.g., sending /ce:plan fix the timeout issue to a Mac Mini running the agents. The plan is generated while the user is away and can be retrieved later via AgentMail.

Open‑source extensions

The last30days repository also provides “open” variants for always‑on bots: variants/open/ – watchlist, briefing, and history scripts. scripts/watchlist.py run‑all – schedules periodic research on selected topics and stores results in a local SQLite database for later natural‑language queries.

Cost management

Running 4–6 parallel Opus sessions can exhaust Claude Max quota quickly. Adding the --codex flag shifts heavy implementation to Codex, reducing token consumption while preserving throughput.

Practical checklist

Create a plan.md that answers: problem, rationale, file changes, acceptance criteria.

Run /last30days to gather the latest community insights and feed them into the planner.

Separate the workflow into distinct states (research, planning, execution, verification) and run each in its own window.

Conclusion

The planning‑first approach turns plan.md into a universal task‑orchestration layer that can handle code, product proposals, strategic documents, and even travel itineraries. By making context explicit and dividing labor between models (Claude for planning, Codex for heavy lifting), teams can achieve higher throughput without relying on ever‑more powerful code‑generation models.

AI Automation productivity agents Planning code

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.