11 min read

Implementing LLM Routing and Parallel Agent Workflows with PydanticAI

This tutorial walks through building semantic routing and parallel execution patterns for LLM agents using the lightweight PydanticAI framework, providing step‑by‑step code, example configurations, and practical observations to help developers create flexible AI‑driven workflows.

AI Large Model Application Practice

Dec 30, 2024

Implementing LLM Routing and Parallel Agent Workflows with PydanticAI

Route Mode

The routing pattern directs a user query to one of several specialized LLM branches, a common technique in Retrieval‑Augmented Generation (RAG) and multi‑agent systems.

Typical scenarios include:

Automatic routing of customer‑service questions to the appropriate module.

Routing requests in an agentic RAG system to different knowledge bases.

Assigning tasks to distinct agents in a multi‑agent architecture.

A routing system consists of two parts:

Selector : usually an LLM that, based on a system prompt, decides which branch to take.

Options : the set of possible branches. In the basic pattern these are "enhanced LLM" calls; in real applications they may be retrievers, RAG engines, or other agents.

class RouteSelection(BaseModel):
    reasoning: str = Field(..., description='Brief explanation why the query should be routed to a specific team, considering key terms, user intent, and urgency.')
    selection: str = Field(..., description='Name of the selected team')

The core routing function receives the user input and a dictionary of routes, uses an LLM to choose the best option, and then forwards the input to the selected branch.

async def route(input: str, routes: Dict[str, Dict[str, str]]) -> str:
    """Use LLM reasoning to route the input to a specialized prompt."""
    print(f"Available routes: {list(routes.keys())}")
    routing_agent = Agent(
        model,
        system_prompt=(
            'Analyze the input question and choose the most appropriate support team from the following options:
'
            + "
".join([f"{key}: {route['prompt']}" for key, route in routes.items()])
            + '
'
        ),
        result_type=RouteSelection,
    )
    route_response = await routing_agent.run(input)
    reasoning = route_response.data.reasoning
    route_key = route_response.data.selection.strip().lower()
    print(reasoning)
    print(f"Chosen route: {route_key}")
    selected_route = routes[route_key]
    worker_agent = Agent(
        model=selected_route['model'],
        system_prompt=selected_route['prompt'],
        tools=selected_route.get('tools', [])
    )
    return (await worker_agent.run(input)).data

Example configuration with three branches – pre‑sale consultation, post‑sale support, and human fallback – demonstrates how each branch can have its own model, prompt, and optional tools.

support_routes = {
    "consult": {
        "name": "售前咨询",
        "prompt": """您是售前咨询专家。回答有关产品信息的咨询, 请保持专业但友好的回应。请始终以\"产品咨询回复：\"开头。使用工具以获取产品信息。""",
        "model": OpenAIModel(model_name='gpt-4o-mini'),
        "tools": [tool_query_productinfo]
    },
    "service": {
        "name": "售后支持",
        "prompt": """您是售后服务专家。回答有关产品使用过程中遇到的问题, 请保持专业但友好的回应。请始终以\"售后服务回复：\"开头。""",
        "model": OpenAIModel(model_name='gpt-4o-mini'),
        "tools": []
    },
    "others": {
        "name": "人工服务",
        "prompt": """你是一位模拟人工服务的支持代表。回复除售前咨询与售后支持以外的问题。请始终以\"人工支持回复：\"开头。使用搜索工具获取辅助信息。""",
        "model": OpenAIModel(model_name='gpt-4o-mini'),
        "tools": [tool_search]
    }
}

questions = [
    "iPhone17什么时候推出?售价多少?",
    "使用贵公司手机时经常出现自动重启, 怎么办？",
    "小米汽车的最新消息有吗？"
]

async def process_questions():
    print("Processing support tickets...
")
    for i, question in enumerate(questions, 1):
        print(f"
-----------------------------
Question {i}: {question}
")
        response = await route(question, support_routes)
        print(f"
Response {i}: {response}
")

asyncio.run(process_questions())

The output shows the LLM’s reasoning, the selected route, and the final response generated by the chosen team.

Parallel Mode

Parallel execution runs the same or different tasks on multiple LLM instances simultaneously, then aggregates the results. Two typical use‑cases are:

Splitting a large task (e.g., document translation) into independent subtasks that run in parallel to improve latency.

Running the same task on several models and selecting the best answer via a voting or decision mechanism.

Key requirements for parallelism are that the tasks are independent (no ordering dependencies) and that the number of tasks is known beforehand.

async def parallel(system_prompt, tasks: List[Dict]) -> List[str]:
    """Execute multiple tasks in parallel, each with its own model, prompt, and input."""
    async def run_task(task: Dict) -> str:
        model = task.get('model', OpenAIModel(model_name='gpt-4o-mini'))
        input_data = task['input']
        agent = Agent(model, system_prompt=system_prompt)
        result = await agent.run(input_data)
        return result.data
    results = await asyncio.gather(*[run_task(task) for task in tasks])
    return results

Example tasks illustrate how different stakeholder prompts can be processed in parallel.

tasks = [
    {"input": """客户:
- 价格敏感
- 希望更好的技术
- 环保关注""", "model": OpenAIModel(model_name='gpt-4o-mini')},
    {"input": """员工:
- 工作安全担忧
- 需要新技能
- 希望明确方向""", "model": OpenAIModel(model_name='gpt-4o-mini')},
    {"input": """投资者:
- 期望增长
- 希望成本控制
- 风险关注""", "model": OpenAIModel(model_name='gpt-4o-mini')}
]

async def main():
    system_prompt = """分析市场变化将如何影响这个利益相关者群体。提供具体的影响和推荐的行动。使用清晰的部分和优先级进行格式化。"""
    impact_results = await parallel(system_prompt, tasks)
    for task, result in zip(tasks, impact_results):
        print(f"{result}: {task['input'].split(':')[0]}")

asyncio.run(main())

If a decision mechanism such as voting is required, additional aggregation logic can be added after the parallel calls.

The next article will cover the more complex orchestrator‑worker and evaluator‑optimizer patterns.

Python LLM routing parallelism PydanticAI

Written by

AI Large Model Application Practice

Focused on deep research and development of large-model applications. Authors of "RAG Application Development and Optimization Based on Large Models" and "MCP Principles Unveiled and Development Guide". Primarily B2B, with B2C as a supplement.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.