Artificial Intelligence 22 min read

Step‑by‑Step Guide to Building Your First AI Agent from Scratch (Full Code Included)

This comprehensive guide walks you through the fundamentals of AI agents, explains the core agent loop, compares workflow patterns with autonomous agents, and provides a practical five‑step process—including tool design, memory handling, testing, and multi‑agent collaboration—complete with real code examples for Anthropic and OpenAI SDKs.

AI Tech Publishing

Mar 31, 2026

Step‑by‑Step Guide to Building Your First AI Agent from Scratch (Full Code Included)

1. How Agents Work

The core agent loop is always the same: user input → LLM reasoning → LLM decides to reply or call a tool → tool execution → result fed back → repeat . The LLM acts as the "brain", tools are the "hands", and memory is the "notebook". Frameworks such as LangGraph, CrewAI, Anthropic SDK, or OpenAI Agents SDK merely wrap this loop with abstractions.

Enhanced LLM

Tools: Functions the model can call (calculator, database, API, file ops). Anthropic exposes tools via input_schema, OpenAI via function objects.

Retrieval: Pulling external information from search engines, documents, or vector databases.

Memory: Persisting information across interactions via message history or external storage.

Workflow vs. Real Agent

Workflows are deterministic—same input always follows the same path—making them cheap and suitable for fixed‑step tasks. Agents are dynamic; the LLM decides the next step and can call tools repeatedly, which is powerful for open‑ended tasks but more expensive. The recommended approach is to start with a simple workflow and upgrade to a full agent only when necessary.

2. Five Workflow Patterns

Prompt Chaining: Split a task into sequential steps, each LLM consumes the previous step's output. Use a programmatic "gatekeeper" to validate quality. Ideal for cleanly decomposable tasks such as generating marketing copy then translating it.

Routing: Classify input and dispatch to specialized prompts. Perfect for support ticket triage (billing, technical, sales).

Parallelisation: Run multiple LLM calls concurrently (sectioning) or run the same task multiple times and vote for higher confidence.

Orchestrator‑Workers: A central LLM dynamically splits work and assigns to worker LLMs at runtime, useful for complex, unpredictable tasks like multi‑file code generation.

Evaluator‑Optimiser: One LLM generates output, another evaluates it, and the loop repeats until quality criteria are met (e.g., translation quality).

Each pattern is widely adopted in Anthropic documentation and suits different cost‑performance trade‑offs.

3. Build Your First Agent

The core implementation follows five concrete steps:

Write down the work you want the agent to perform.

Decide which tools are required.

Tell the model how it should behave.

Test with five realistic examples.

Only add complexity when the simple version fails.

Two practical paths are provided: Anthropic and OpenAI.

Anthropic: Minimal Path

Anthropic released Claude Code in Feb 2025 (renamed Claude Agent SDK). The latest GitHub release (v0.1.50, Mar 2026) supports file, shell, web‑search, and coding tools.

Example system prompt:

SYSTEM_PROMPT = '''
You are a careful research assistant.

Your job is to help the user research topics accurately.
Use tools when needed.
Do not guess.
If information is uncertain or incomplete, say so clearly.
Always produce:
1. Summary
2. Key findings
3. Risks or uncertainty
4. Final conclusion
'''

Sample user queries:

"Research the latest AI Agent SDK"

"Compare Anthropic and OpenAI for a beginner"

"Find three strong sources and summarize"

OpenAI: Minimal Path

OpenAI released Agents SDK on 2025‑03‑11; version 0.13.1 (Mar 2026) includes built‑in web search, file search, and code interpreter tools. Simple classification agent example:

from agents import Agent, Runner

agent = Agent(
    name="Support Triage Agent",
    instructions="""
You classify customer requests.
Choose exactly one category:
- billing
- technical
- sales

Reply with:
1. Category
2. One sentence explaining why
""",
)

result = Runner.run_sync(agent, "I was charged twice for my subscription this month.")
print(result.final_output)

A custom calculator tool example:

from agents import Agent, Runner, function_tool

@function_tool
def calculate(expression: str) -> str:
    import math
    allowed = {k: v for k, v in math.__dict__.items() if not k.startswith("__")}
    return str(eval(expression, {"__builtins__": {}}, allowed))

agent = Agent(
    name="Math Helper",
    instructions="Help the user solve maths problems. Use the calculator tool when needed.",
    tools=[calculate],
)

result = Runner.run_sync(agent, "What is compound growth on 10000 at 5 percent for 8 years?")
print(result.final_output)

4. How to Design Good Tools

Ask yourself "Does this task need a tool?" before adding anything. Tools should be single‑purpose, clearly named, and have explicit input signatures. Bad example: manage_files(action, file, destination, overwrite, format, permissions) . Good examples: read_file(path) , write_file(path, content) , delete_file(path) . Describe when the agent may invoke the tool (e.g., "use this tool for any math calculation; never guess the result").

5. Adding Memory

Memory comes in two forms:

Short‑term (conversation) memory: What has been said so far; provided automatically by most SDKs.

Long‑term (external) memory: Persistent knowledge such as notes, PDFs, or database records.

Decide if you need memory with a simple AI prompt. Options:

Option A – No memory: Works for ~70% of simple use‑cases.

Option B – Conversation memory: Default in SDKs; just avoid resetting the message list.

Option C – File‑based (RAG) memory: Upload documents and use a file‑search tool.

Do not over‑engineer with vector databases or embeddings unless you have proven need.

6. Running the Agent

Testing is critical. Generate realistic test cases with AI, including messy, vague, edge‑case inputs. Example prompt to generate 15 test cases: <code>I built an AI agent with this goal: [Goal] Create 15 realistic user inputs: - messy - vague - real‑world style Also include: - edge cases - confusing inputs - bad inputs </code> Iteratively debug by asking: Is the prompt clear? Is the output format ambiguous? Are tools missing? Are rules too lax? Use AI‑assisted debugging: <code>Here is my agent: [agent definition] Here is what I asked: [input] Here is the output: [output] What went wrong? How do I fix it? Be specific. </code> Avoid adding multiple agents, complex pipelines, or heavy RAG until the simple version is stable. 7. Multi‑Agent Collaboration Start with a single agent. Add more only when the task can be cleanly split, a single agent cannot handle the load, or distinct permissions/skills are required. Three legitimate cases: Different skills (e.g., research vs. writing). Clear pipeline (input → analysis → writing → output). Different permissions (one reads data, another performs actions). The safest pattern is a Supervisor model: user → main agent → optional specialist agents. Avoid swarm‑style or fully autonomous multi‑agent systems early on. 8. Checklist & Conclusions Common mistakes: building a generic “do everything” agent, vague output formats, over‑complicated tool signatures, and premature multi‑agent setups. Checklist: Keep the task narrow. Define explicit output format. Provide concrete examples. Add tools only when needed. Test with real‑world noisy inputs. Iterate one change at a time. Three actionable takeaways: Start by building a minimal agent from scratch; the loop is only ~50 lines of Python. Use the simplest workflow (Prompt Chaining or Routing) before moving to a fully autonomous agent. Invest early in clear tool design and thorough testing; good tools and test cases improve performance far more than switching models. With these fundamentals—agent loop, workflow patterns, tool design, memory handling, and disciplined testing—you can reliably build and extend AI agents despite the fast‑moving ecosystem.

LLM Prompt Engineering Testing AI Agent Memory Multi-agent tool design

Written by

AI Tech Publishing

In the fast-evolving AI era, we thoroughly explain stable technical foundations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.