Google Strikes Back: Gemini’s New Features Take on Claude Code
The article reviews Google Gemini’s three‑pronged rollout— a Mac desktop app with global shortcuts and window‑sharing, a Gemini CLI enhanced with Subagents that keep context clean and enable parallel expert tasks, and the Gemini 3.1 Flash TTS model with Audio Tags—comparing each to competitors and highlighting practical use cases and limitations.
Gemini Mac Desktop App
Google released a native Swift Gemini client for macOS built with the Antigravity team in a few days. Core experiences:
Global shortcuts : Option + Space opens a mini chat window on any screen; Option + Shift + Space opens the full chat UI. Both shortcuts are configurable.
Window sharing : The app can capture the current window’s content (documents, code, spreadsheets) and answer context‑aware questions such as “What bug does this Python script have?” without copying text.
Creative capabilities : Built‑in image generation (Nano Banana) and video generation (Veo) turn the desktop into a creation workstation.
Multi‑device sync : Chat history and memory sync across devices under the same Google account.
System requirements:
macOS Sequoia (15.0) or later
Apple Silicon (M‑series) only
8 GB RAM or more
200 MB free disk space
Stable internet connection
Free to use
Download: https://gemini.google/mac
Gemini CLI Subagents
When using Gemini CLI for complex tasks the main Agent’s context window grew large, degrading response quality. Subagents address this by giving the main Agent a team of specialized experts, each with an isolated context window, dedicated system prompts, its own tool set, and a separate MCP server.
Built‑in Subagents
generalist : inherits all tools; suited for bulk refactoring, high‑output tasks.
codebase_investigator : focuses on code‑base exploration, architecture analysis, dependency tracing, and bug root‑cause identification.
cli_help : answers configuration, command, and usage questions.
Experimental browser_agent
Can automate browser actions (form filling, button clicking) when Chrome 144+ is enabled in settings.json.
Custom Subagent definition
A custom Subagent is defined by a single Markdown file placed in .gemini/agents/ (project‑level) or ~/.gemini/agents/ (global). Example definition for a frontend specialist:
---
name: frontend-specialist
description: Frontend specialist in building high-performance, accessible, and scalable web applications.
tools:
- read_file
- grep_search
- glob
- list_directory
- web_fetch
- google_web_search
model: inherit
---
You are a Senior Frontend Specialist and UI/UX Architect.
Your goal is to design and implement exceptional, production‑grade user interfaces.
### Core Principles:
- Architecture & Scalability
- Performance & Optimization
- Accessibility (A11y)Configuration fields (all optional unless noted): name: unique identifier used with the @ syntax. description: description that the main Agent uses to decide when to dispatch the Subagent. tools: list of authorized tools; supports wildcards such as * (all) or mcp_* (all MCP tools). model: model to use; default inherit (inherits the main Agent’s model). temperature: sampling temperature, range 0‑2. max_turns: maximum dialogue turns, default 30. timeout_mins: timeout in minutes, default 10.
Parallel execution
Subagents run in parallel; total execution time approximates the slowest Subagent. Parallelism is ideal for read‑only tasks (analysis, research, testing) because concurrent file edits can conflict.
Example invocation: @codebase_investigator 帮我梳理认证模块的调用链路
Example batch: @generalist 把项目里所有文件的 License 头更新一遍
Security mechanisms
Tool isolation: each Subagent can only use explicitly authorized tools.
Recursive protection: Subagents cannot call other Subagents, preventing infinite loops and token explosion.
Policy Engine (optional): fine‑grained permission control, e.g., allow only git push for a specific Subagent.
Current Subagents can be listed with the /agents command.
/agents command output shows all available Subagents.
Gemini 3.1 Flash TTS
The latest text‑to‑speech model scores Elo 1211 on the Artificial Analysis TTS leaderboard, placing it in the “high quality, low price” quadrant.
Key innovations
Audio Tags : embed directives in text to control speaking style, scene direction, and speaker‑level specifics (tone, speed, accent).
Scene Direction : set environment and dialogue instructions, e.g., “a late‑night broadcast with a warm, low voice.”
Speaker‑level Specificity : assign independent audio profiles to each role; inline tags can switch profiles mid‑sentence.
Seamless Export : after tuning parameters in Google AI Studio, export directly to Gemini API code for reuse.
Additional highlights:
Supports 70+ languages, including Chinese.
Native multi‑role dialogue enables podcast and audiobook creation.
SynthID watermarks mark generated audio as AI‑created.
Model card: https://deepmind.google/models/model-cards/gemini-3-1-flash-audio/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
