Cloud Computing 14 min read

Hard‑Core Cloud Foundations Power Agentic AI: Highlights from re:Invent 2025 Peter & Dave Keynote

At re:Invent 2025, AWS executives Peter DeSantis and Dave Brown detailed a series of hardware and service innovations—including Graviton5, Trainium3/4, Lambda Managed Instances, Project Mantle, and S3 Vectors—showcasing how security, availability, elasticity, cost, and agility are becoming even more critical for the AI era, with concrete performance benchmarks from customers such as Airbnb, Apple, and Twelve Labs.

Amazon Cloud Developers
Amazon Cloud Developers
Amazon Cloud Developers
Hard‑Core Cloud Foundations Power Agentic AI: Highlights from re:Invent 2025 Peter & Dave Keynote

Keynote Overview

On December 5, 2024 (Beijing time), AWS senior vice presidents Peter DeSantis and Dave Brown delivered a technical keynote titled “Infrastructure Innovation” at re:Invent 2025, outlining the core value of cloud infrastructure for the AI era.

Core Infrastructure Themes

The speakers emphasized that AI is reshaping application development, but the fundamental attributes of cloud computing—security, availability, elasticity, cost efficiency, and agility—are becoming even more essential. Security must be a top priority because AI also empowers attackers; availability must withstand unprecedented AI workloads; elasticity must match the scale of AI services; cost control is critical given the high expense of AI training and inference; and agility is required to rapidly start, optimize, and adjust AI workloads.

Hardware Innovations

Amazon Graviton5 introduces a 192‑core processor with unified fast memory access, a 5× larger L3 cache, and a fan‑power reduction of 33 % thanks to a direct‑silicon cooling design. The M9g instance built on Graviton5 delivers 25 % higher performance than the previous M8g, offering the best price‑performance in EC2.

Early customer benchmarks demonstrate the impact: Airbnb achieved a 25 % performance uplift, Atlassian reduced latency by 20 %, Honeycomb saw a 36 % per‑core performance increase, and SAP HANA experienced a 60 % boost in OLTP query performance.

Apple Swift on Graviton – Apple’s cloud platform team rewrote core services in Swift and migrated them to Graviton, resulting in a 40 % performance improvement and a 30 % cost reduction. Apple also open‑sourced Swift and collaborated with AWS to provide the first official Swift toolchain for Amazon Linux.

Serverless Evolution

Dave Brown recounted the origin of Amazon Lambda in 2013, when a small team sought to let developers submit code without managing servers. The service grew from an image‑thumbnailing need in the S3 team to a core serverless offering that still runs on EC2 instances, giving customers control over instance type and hardware while Lambda manages configuration, caching, availability, and scaling. This hybrid model opens serverless to workloads such as video processing, ML preprocessing, and high‑throughput analytics.

Inference Engine – Project Mantle

Project Mantle is a purpose‑built inference engine that processes requests in four stages—tokenization, pre‑fill, decoding, and detokenization—each with distinct resource profiles (CPU‑bound, GPU‑bound, memory‑bandwidth‑bound, latency‑sensitive). The system exposes three priority channels (Priority, Standard, Flex) that isolate customer queues, ensuring one customer’s traffic spikes do not affect others. A journal system based on DynamoDB and S3 captures request state for fault‑tolerant recovery, and the scheduler can pause long‑running jobs during traffic spikes and resume them later.

Vector Search and Multimodal Embeddings

Peter DeSantis explained that vectors enable computers to reason about physical attributes, expressions, and relationships similarly to the human brain, using high‑dimensional spaces (often >3,000 dimensions). AWS launched Amazon Nova, a multimodal embedding model supporting text, documents, images, video, and audio, and integrated vector capabilities across all data services.

Amazon S3 Vectors stores billions of embedding vectors directly in S3 buckets, delivering sub‑100 ms query latency at massive scale. Customer case: Twelve Labs uses S3 Vectors to power its Marengo and Pegasus models, processing millions of video hours without data migration, dramatically improving unit economics. Arc XP leverages the same embeddings to quickly locate relevant video segments for news story creation.

AI Accelerators – Trainium

Trainium3 powers the Amazon EC2 Trn3 UltraServers, featuring 144 Trainium3 chips across two racks to form a single AI supercomputer delivering 360 PFLOPS of FP8 compute—4.4× the performance of Trn2 UltraServers. The servers provide 20 TB of high‑bandwidth memory with 700 TB/s bandwidth (3.9× previous generation) and achieve more than five times the token‑per‑megawatt output of Trainium2 on GPT‑OSS‑120B.

System‑level innovations include the first integration of Trainium, Graviton, and Nitro chips on a single board, robot‑ready modular components, a dedicated neuron switch for full‑duplex bandwidth and ultra‑low latency, and Elastic Fabric Adapter enabling direct memory sharing among thousands of Trainium servers.

Micro‑architectural optimizations—such as micro‑scaling, accelerated Softmax, tensor dereferencing, background transposition, traffic shaping, memory‑add‑write, and memory‑scatter—are not listed in official specs but significantly improve real‑world workloads.

Roadmap: Trainium4 is under development and promises 6× the FP4 compute performance, 4× memory bandwidth, and 2× HBM capacity compared with Trainium3, securing AWS’s leadership in AI chips.

Developer Tools

Upcoming releases include Nki , a full‑stack open‑source design slated for Q1 2026 that combines matrix‑operation simplicity with instruction‑level hardware access; Neuron Profiler , a hardware‑based performance analyzer that runs without impacting production code; Neuron Explorer , an interactive UI that visualizes profiling data, auto‑detects bottlenecks, and suggests optimizations; and native PyTorch support for Trainium, expected early next year, allowing a simple .to("neuron") call to run models on Trainium.

Conclusion

Peter DeSantis concluded that AI makes the foundational properties of cloud infrastructure more important than ever. The continuous investment from Amazon Nitro to Graviton to Trainium is not only solving past technical pain points but also preparing the platform for the upcoming Agentic AI era. The announced achievements demonstrate AWS’s dominant position in cloud infrastructure and its commitment to enabling limitless AI possibilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformanceserverlessAIvector searchAWShardwareCloud
Amazon Cloud Developers
Written by

Amazon Cloud Developers

Official technical community of Amazon Cloud. Shares practical AI/ML, big data, database, modern app development, IoT content, offers comprehensive learning resources, hosts regular developer events, and continuously empowers developers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.