Inside Google gemini-cli: Turning the Terminal into an AI Agent with ReAct Architecture

This article systematically dissects Google’s open‑source gemini‑cli, revealing how it transforms a traditional command‑line terminal into an AI‑driven collaborative interface by detailing its ReAct loop, tool‑calling mechanisms, context management, and extensible architecture for building similar terminal agents.

Tencent Technical Engineering
Tencent Technical Engineering
Tencent Technical Engineering
Inside Google gemini-cli: Turning the Terminal into an AI Agent with ReAct Architecture

Introduction

Google open‑source gemini-cli integrates large language models (LLMs) into the command‑line terminal, enabling AI‑driven code analysis, batch file processing, and multimodal interactions.

Repository: https://github.com/google-gemini/gemini-cli

Core Capability Demonstrations

Scenario 1 – Batch File Processing

Context: A directory containing JPG images.

User input:

gemini > 将这个目录里的所有图片转换为 png 格式,并根据照片的 EXIF 数据里的拍摄日期来重命名它们。

The agent creates a four‑step plan, generates a convert_images.py script, installs the Pillow library (switching to pip3 if needed), rewrites the script when EXIF data is missing, and validates the result with ReadFolder.

Scenario 2 – Project Understanding

Context: The cloned gemini-cli source code.

User input:

gemini > 本文件中有gemini-cli的源码,帮我梳理gemini-cli的整体架构。关键的目录和它们的作用分别是什么?

The agent first calls ReadFolder to map the repository, then reads docs/architecture.md, merges the information with the directory map, and returns a structured architecture overview.

Scenario 3 – Multimodal Creativity

Context: A login‑page mockup image login-mockup.png.

User input:

gemini > 这是我们的登录页面设计图 (login-mockup.png),帮我生成对应的 HTML 和 CSS 代码,并用浏览器打开预览编码结果。

The agent parses the image, generates index.html and style.css, writes the files, and executes an open command to preview the page in a browser.

Architecture and Core Source Analysis

Core Architecture and Workflow

gemini-cli

is organized into three layers:

User‑interaction layer – terminal UI and command parsing.

Logic layer – the ReAct loop implemented in the cli package.

Model‑service / tool layer – communication with the Gemini model and execution of registered tools.

ReAct Loop Details

From User Input to Model Reasoning

The entry point is submitQuery in packages/cli/src/ui/hooks/useGeminiStream.ts. It preprocesses the query, handles special commands, and streams the request to the Gemini model via geminiClient.sendMessageStream().

From Action Plan to Tool Execution

When the model returns a ToolCallRequest, processGeminiStreamEvents collects the requests and forwards them to scheduleToolCalls (implemented in useReactToolScheduler.ts). The scheduler creates a CoreToolScheduler instance from packages/core/src/core/coreToolScheduler.ts, which validates, queues, optionally confirms risky tools, and finally calls each tool’s execute() method.

Observation and Recursion

After all tools finish, onAllToolCallsComplete aggregates the results, passes them to handleCompletedTools in the UI hook, which repackages the output and recursively invokes submitQuery for the next reasoning step.

Tool Declaration, Registration, and Safety

Each tool (e.g., read_file) declares its schema in a JSON‑compatible object, enabling the model to understand its capabilities. Tools are instantiated and registered in ToolRegistry at startup. High‑risk tools such as run_shell_command trigger shouldConfirmExecute(), prompting the UI layer for user approval before execution.

Context Management

Short‑term session memory is maintained via useGeminiStream. Long‑term knowledge is injected through system prompts. User‑provided context files are loaded via loadCliConfig and merged with the base system prompt in getCoreSystemPrompt. The @ syntax allows on‑the‑fly file inclusion, which the UI hook parses and feeds to the model.

Architectural Ideas and New Paradigm

Reusable Agent Kernel

The packages/core module encapsulates all AI‑related logic (Gemini client, tool registry, scheduler, logger) without any UI code, while packages/cli adapts this kernel to a terminal environment. This separation yields a portable SDK that can be embedded in web services, desktop apps, or other agents.

LLM as Dynamic Scheduler

Instead of hard‑coded control flow, the LLM generates execution plans at runtime, deciding which tools to invoke. Developers only need to register atomic tools; the model orchestrates them, enabling emergent capabilities and reducing code complexity.

Human‑in‑the‑Loop Safety

Risky actions are paused for explicit user confirmation, implementing a HITL pattern that balances AI autonomy with developer control.

Open Protocol Ecosystem

gemini-cli

supports the Model Context Protocol (MCP) and Agent‑to‑Agent (A2A) protocol, allowing external tool services to be discovered, registered, and invoked, and exposing its own capabilities for downstream agents.

Conclusion and Outlook

The analysis shows that gemini-cli is a concrete reference implementation for building AI agents that combine ReAct reasoning, extensible tool calling, context‑aware interaction, and safety mechanisms. Future directions include tighter OS integration, larger long‑term memory via vector stores, and distributed multi‑agent collaboration.

CLILLMAI AgentTool CallingGemini CLI
Tencent Technical Engineering
Written by

Tencent Technical Engineering

Official account of Tencent Technology. A platform for publishing and analyzing Tencent's technological innovations and cutting-edge developments.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.