24 min read

How Hermes’s Three‑Way Adapter Unifies Anthropic, Gemini, and Codex APIs

This article explains how Hermes uses three dedicated adapters—anthropic_adapter.py, gemini_native_adapter.py, and codex_responses_adapter.py—to translate the wildly different request and response schemas of Anthropic Messages, Gemini generateContent, and Codex Responses into a single OpenAI‑style chat.completions interface, covering message formats, system prompts, tool calls, reasoning signatures, lazy SDK loading, pure‑function design, and defensive validation.

James' Growth Diary

Jun 9, 2026

How Hermes’s Three‑Way Adapter Unifies Anthropic, Gemini, and Codex APIs

Why a Three‑Way Adapter?

In 2026 the LLM market is no longer dominated by a single API. OpenAI uses the familiar chat/completions schema, but Anthropic adopts a Messages API with a different parameter layout and a separate system field, Gemini exposes a generateContent REST endpoint that bundles content in parts and uses systemInstruction, while Codex provides a completely new Responses schema with input items, instructions, and reasoning items.

If the Hermes Agent core tried to handle all these quirks directly, run_agent.py would become a tangled maze of thousands of if‑else branches. Hermes therefore follows a clear design philosophy: compress every protocol difference into an adapter layer and keep the Agent core aware of only one interface.

The unified interface is the OpenAI chat.completions.create(**kwargs) call, whose response shape is accessed via response.choices[0].message. No matter whether the underlying provider is Anthropic, Gemini, or Codex, the Agent loop always works with this single shape.

Deep Comparison of the Three APIs

Message format : OpenAI uses {role, content} arrays; Anthropic also uses {role, content} arrays but treats system as a separate field; Gemini uses contents[] plus systemInstruction; Codex uses input[] with instructions.

Text content : OpenAI – content: string; Anthropic – content: block[] (each block has a type); Gemini – parts: [{text}]; Codex – content: input_text/output_text.

Multimodal : OpenAI – [{type:"image_url"...}]; Anthropic – source: {type, media_type, data}; Gemini – inlineData: {mimeType, data}; Codex – input_image: {image_url}.

System prompt : OpenAI – first role:system message; Anthropic – independent system parameter; Gemini – systemInstruction; Codex – instructions field.

Empty messages : OpenAI allows empty content; Anthropic rejects them (400 error); Gemini rejects empty parts; Codex allows empty content.

Tool‑call definitions also differ:

OpenAI – {type:function, function:{name, description, parameters}} Anthropic – {name, description, input_schema} Gemini – {functionDeclarations: [{name, description, parameters}]} Codex – {type:function, name, description, parameters, strict} Calling formats, result formats, and ID schemes vary as well, and each provider requires its own way of preserving reasoning state across turns (signatures, thoughtSignature, or encrypted_content).

Design Conclusions

Principle 1 – Unified External Interface, Divergent Internal Implementations

All callers use the same OpenAI‑style call:

# Python‑style unified call
response = client.chat.completions.create(
    model="...",
    messages=[...],
    tools=[...],
    max_tokens=4096,
)
text = response.choices[0].message.content
tool_calls = response.choices[0].message.tool_calls

In TypeScript the same shape is expressed as:

interface ChatCompletionResponse {
  choices: Array<{
    message: {
      role: "assistant";
      content: string | null;
      tool_calls?: ToolCall[];
      reasoning_content?: string;
    };
    finish_reason: string;
  }>;
}

Regardless of whether the underlying provider is Anthropic, Gemini, or Codex, the Agent receives this uniform structure.

Principle 2 – Bidirectional Translation

Each adapter performs a request translation (OpenAI → Provider native) and a response translation (Provider native → OpenAI). The mapping is summarized below:

Anthropic : request messages[] → Anthropic blocks + extracted system; response Messages → choices[0].message.

Gemini : request messages[] → contents[] + systemInstruction; response generateContent → OpenAI shape.

Codex : request messages[] → input[] + instructions; response Responses → OpenAI shape.

Principle 3 – Thinking‑Chain Signature Management

Cross‑turn reasoning data must be preserved:

# Python: Anthropic thinking block signature handling
# Only the last assistant thinking block is kept; earlier ones are stripped
# to avoid "Invalid signature in thinking block" (400)

// TypeScript: Gemini thoughtSignature stored in tool_call.extra_content
interface GeminiToolCallExtra {
  google?: { thought_signature: string }; // must survive to next round
}

// TypeScript: Codex reasoning item encrypted_content must be replayed unchanged
interface CodexReasoningItem {
  type: "reasoning";
  encrypted_content: string; // ← must be sent back verbatim
  id?: string;
}

Implementation Details

4.1 Anthropic Adapter – Message‑Structure Translation

The most complex adapter (≈1990 lines). Core functions are convert_messages_to_anthropic() and build_anthropic_kwargs():

def convert_messages_to_anthropic(messages, base_url=None, model=None):
    system = None
    result = []
    for m in messages:
        role = m.get("role", "user")
        content = m.get("content", "")
        if role == "system":
            system = content
            continue
        if role == "assistant":
            blocks = []
            if content:
                blocks.append({"type": "text", "text": content})
            for tc in m.get("tool_calls", []):
                blocks.append({
                    "type": "tool_use",
                    "id": sanitize_tool_id(tc["id"]),
                    "name": tc["function"]["name"],
                    "input": json.loads(tc["function"]["arguments"]),
                })
            result.append({"role": "assistant", "content": blocks})
        if role == "tool":
            tool_result = {
                "type": "tool_result",
                "tool_use_id": sanitize_tool_id(m["tool_call_id"]),
                "content": content,
            }
            result.append({"role": "user", "content": [tool_result]})
    return system, result

Boundary‑case handling includes:

Detecting empty messages – Anthropic rejects them, so a placeholder "(empty message)" is injected.

Removing orphan tool_use blocks without matching tool_result.

Removing orphan tool_result blocks without a preceding tool_use.

Enforcing strict role alternation by merging consecutive messages of the same role.

Keeping only the last assistant thinking block’s signature.

Truncating image blocks to the three most recent computer_use screenshots.

Model‑parameter adaptation is also performed: newer Claude models receive an adaptive thinking config, older models get a manual budget, and models that forbid sampling parameters have those fields removed.

4.2 Gemini Adapter – Native API Preference

Gemini’s native REST API is closer to OpenAI’s schema, so Hermes deliberately avoids Google’s /v1beta/openai compatibility layer because it mishandles multi‑turn tool calls and signature management.

def build_gemini_request(*, messages, tools, tool_choice, max_tokens, ...):
    contents, system_instruction = _build_gemini_contents(messages)
    request = {"contents": contents}
    if system_instruction:
        request["systemInstruction"] = system_instruction
    gemini_tools = _translate_tools_to_gemini(tools)
    if gemini_tools:
        request["tools"] = gemini_tools
    tool_config = _translate_tool_choice_to_gemini(tool_choice)
    if tool_config:
        request["toolConfig"] = tool_config
    return request

Response translation extracts text, thought, and functionCall parts from the Gemini candidates[0].content.parts array and assembles a SimpleNamespace that mimics OpenAI’s chat.completion shape.

Free‑tier detection sends a minimal request and inspects the x‑ratelimit‑limit‑requests‑per‑day header; if the limit is ≤ 1000 the adapter treats the key as free and returns a helpful 429‑handling hint.

4.3 Codex Adapter – Full‑Scale Protocol Conversion

Codex’s Responses API differs in every dimension, making the adapter the most heavyweight.

def _chat_messages_to_responses_input(messages, *, is_xai_responses=False):
    items = []
    for msg in messages:
        if msg["role"] == "system":
            continue  # system merged into instructions later
        if msg["role"] == "assistant":
            # 1️⃣ replay previous reasoning items (encrypted_content)
            codex_reasoning = msg.get("codex_reasoning_items")
            if codex_reasoning:
                for ri in codex_reasoning:
                    items.append({k: v for k, v in ri.items() if k != "id"})
            # 2️⃣ replay message items (preserve id/phase for cache)
            codex_message_items = msg.get("codex_message_items")
            if codex_message_items:
                for mi in codex_message_items:
                    items.append(mi)
            else:
                items.append({"role": "assistant", "content": content_text})
            # 3️⃣ function call items
            for tc in msg.get("tool_calls", []):
                items.append({
                    "type": "function_call",
                    "call_id": tc["id"],
                    "name": tc["function"]["name"],
                    "arguments": tc["function"]["arguments"],
                })
        if msg["role"] == "tool":
            items.append({
                "type": "function_call_output",
                "call_id": msg["tool_call_id"],
                "output": msg["content"],
            })
    return items

Key Codex‑specific requirements: instructions field aggregates all system messages because the API has no native system concept. store=false is mandatory; otherwise the API misbehaves. encrypted_content must be passed back unchanged to preserve the reasoning chain.

Call‑ID conversion: Codex uses fc_XXXX while OpenAI expects call_XXXX; the helper _split_responses_tool_id() normalises them.

Phase preservation ( phase: "commentary" or "final_answer") is required for prefix‑cache hits.

Tool‑call leakage detection flags malformed function_call strings as incomplete so the Agent can retry.

Response translation extracts message text, reasoning text, and tool calls, while storing raw items for the next turn:

def _normalize_codex_response(response):
    for item in response.output:
        if item.type == "message":
            content_parts.append(_extract_message_text(item))
            message_items_raw.append(item_to_dict(item))
        elif item.type == "reasoning":
            reasoning_parts.append(_extract_reasoning_text(item))
            reasoning_items_raw.append({"type": "reasoning", "encrypted_content": item.encrypted_content})
        elif item.type == "function_call":
            tool_calls.append(SimpleNamespace(
                id=call_id,
                function=SimpleNamespace(name=item.name, arguments=item.arguments),
            ))
    return SimpleNamespace(
        content=final_text,
        tool_calls=tool_calls,
        reasoning=reasoning_text,
        codex_reasoning_items=reasoning_items_raw,
        codex_message_items=message_items_raw,
    ), finish_reason

When a user accesses Codex via Claude Code’s OAuth route, the adapter rewrites the Hermes Agent identifier to Claude Code and prefixes tool names with mcp_ so that Anthropic’s routing logic works correctly.

Cross‑Adapter Unified Design Patterns

Lazy‑load SDKs : each adapter imports its provider SDK only when needed, saving ~220 ms cold‑start time.

Pure functions + no state : conversion functions take input, return output, and never mutate globals, enabling safe concurrent use.

Defensive validation : adapters raise ValueError on malformed inputs (empty messages, orphan tools, invalid IDs) instead of silently degrading; the higher‑level auxiliary_client.py decides how to fallback.

Semantic field mapping : every field is translated with meaning preservation (e.g., OpenAI reasoning_effort ↔ Anthropic thinking.enabled ↔ Gemini thinkingConfig ↔ Codex reasoning.effort).

Conclusion

The three‑way adapter stack solves the hardest part of Hermes’s “model‑agnostic” goal: it lets three fundamentally different LLM APIs be driven by a single Agent loop. By enforcing a unified interface, performing bidirectional translation, handling every edge case (empty payloads, signature lifecycles, tool‑call IDs, free‑tier limits), and adopting lazy loading plus pure‑function design, the adapters remain maintainable and performant. Although the codebase exceeds 4 000 lines, the investment pays off because adding a new provider only requires a new adapter while the Agent core stays untouched.

Next up (part 16) will be the Thinking/Budget system , where we discuss why expanding Anthropic’s thinking budget from 1 024 to 32 000 tokens dramatically improves Agent performance and how Hermes synchronises budgeting across all three providers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

TypeScript Python LLM Gemini adapter pattern API integration Anthropic Codex

Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Why a Three‑Way Adapter?

Deep Comparison of the Three APIs

Design Conclusions

Principle 1 – Unified External Interface, Divergent Internal Implementations

Principle 2 – Bidirectional Translation

Principle 3 – Thinking‑Chain Signature Management

Implementation Details

4.1 Anthropic Adapter – Message‑Structure Translation

4.2 Gemini Adapter – Native API Preference

4.3 Codex Adapter – Full‑Scale Protocol Conversion

Cross‑Adapter Unified Design Patterns

Conclusion

James' Growth Diary

How this landed with the community

Was this worth your time?

0 Comments

Principle 1 – Unified External Interface, Divergent Internal Implementations

Principle 2 – Bidirectional Translation

Principle 3 – Thinking‑Chain Signature Management