Artificial Intelligence 13 min read

Google I/O 2026 Unveils Gemini Omni and Gemini 3.5 Flash – A Leap in Multimodal AI

At Google I/O 2026 the company introduced Gemini Omni, a truly multimodal model that can ingest any combination of text, image, audio or video and generate high‑quality content, and Gemini 3.5 Flash, which outperforms Gemini 3.1 Pro across major benchmarks while delivering four‑times faster token throughput, alongside the new Antigravity 2.0 agent platform and the Gemini Spark personal AI assistant.

IT Services Circle

May 20, 2026

Google I/O 2026 Unveils Gemini Omni and Gemini 3.5 Flash – A Leap in Multimodal AI

Google I/O 2026 showcased a series of breakthroughs under the Gemini brand. The headline announcement was Gemini Omni, described as a "full‑capability" model that accepts arbitrary inputs—images, audio, video, text—and produces any form of output, including high‑resolution video generated from a single prompt.

Demonstrations highlighted Omni’s ability to create a scientifically accurate protein‑folding animation from the prompt "use clay animation to explain protein folding," to map each English alphabet letter to a distinct object (e.g., C → capybara, D → disco ball, L → lava lamp), and to edit video frames seamlessly while preserving character consistency and physical logic. Omni can also generate a personalized avatar that speaks in the user’s voice and performs actions the user has never performed.

Following Omni, Google unveiled Gemini 3.5 Flash, a new flagship model that "crushes" the previous Gemini 3.1 Pro in virtually every benchmark. Reported scores include Terminal‑Bench 2.1 (coding) 76.2 %, GDPval‑AA (real‑world agent tasks) 1656 Elo, MCP Atlas (large‑scale tool use) 83.6 %, and CharXiv Reasoning (multimodal understanding) 84.2 %. In speed tests, Flash processes 289 tokens/s—over four times faster than competing frontier models such as GPT‑5.5 and Claude Opus 4.7.

The event also introduced Antigravity 2.0, an upgraded agent‑development platform that moves from an IDE to a standalone desktop application and embraces an agent‑first design. New features include dynamic sub‑agent generation, asynchronous task management, scheduled tasks, and slash commands like /goal, /grill‑me, and /browser. These capabilities were demonstrated by building a complete operating‑system kernel from scratch using 93 parallel agents, processing 2.6 billion tokens in 12 hours with less than $1,000 in API costs, and then running the classic game DOOM on the AI‑generated OS.

Google also announced Gemini Spark, a 7×24 h personal AI agent that integrates tightly with Google Workspace. In a live demo, Spark drafted a team email summarizing a week of Gemini Live updates, automatically pulled relevant Gmail, Docs, and chat data, and applied a custom "ghostwriter" skill to match the presenter’s tone. In a separate scenario, Spark organized a neighborhood block party: it created a Google Sheets RSVP tracker, sent personalized invitations, generated a Google Slides deck with venue details, and performed all steps without the user opening any app. Voice input was demonstrated by issuing three tasks in a single spoken command, which Spark split into parallel threads and executed autonomously.

Pricing details were disclosed: the AI Ultra subscription now offers a $100 per‑month tier for Spark beta access, and the top‑tier Ultra plan was reduced from $250 to $200 per month. Gemini Spark will be available to US AI Ultra users next week, while Gemini Omni, Flash, and Spark are being rolled out through Gemini App, Google Flow, and YouTube Shorts, with API access via Gemini API and Google AI Studio for developers.

Overall, the announcements illustrate that Google has combined multimodal understanding, multimodal generation, and continuous‑online agents into a single ecosystem, effectively removing the "technical impossibility" barrier to artificial superintelligence and shifting the challenge to engineering deployment speed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Multimodal AI Large Language Model benchmark Gemini AI generation Google I/O Agent Platform

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.