Industry Insights 15 min read

Why AWS Bedrock AgentCore Signals a New Era for Agentic AI Infrastructure

The article analyzes AWS Bedrock AgentCore and related hardware and software requirements for Agentic AI, covering runtime isolation with microVMs, memory architectures, identity and gateway design, zero‑trust networking, and the challenges of multi‑tenant KVCache and context engineering.

Baobao Algorithm Notes

Jul 28, 2025

Why AWS Bedrock AgentCore Signals a New Era for Agentic AI Infrastructure

TL;DR

Recent work on Agentic AI applications reveals a growing need for specialized infrastructure, and AWS’s announcement of Bedrock AgentCore, S3 vector buckets, and the custom GB200 instance at the 2025 New York Summit highlights the direction of that evolution.

1. AWS Bedrock AgentCore Overview

AWS introduced a full Agentic AI stack called Bedrock AgentCore, which addresses security isolation, identity, access control, observability, code execution, and tool integration. The service consists of seven core components:

1.1 AgentCore Runtime

AgentCore Runtime is a secure serverless environment built on MicroVMs (Firecracker) that provides session‑level isolation, built‑in authentication, and fast cold starts. It can run any open‑source framework (e.g., CrewAI, LangGraph, AWS Strands) and supports arbitrary protocols and models, allowing dynamic expansion of an Agent’s tool‑calling capabilities.

1.2 AgentCore Memory

The memory layer manages both short‑term and long‑term context. Short‑term memory tracks immediate conversational events, while long‑term memory stores user preferences, summaries, and semantic facts. It leverages vector databases and the newly announced Amazon S3 Vectors for storage and retrieval.

1.3 AgentCore Identity

Identity enables Agents to integrate seamlessly with AWS services and third‑party tools such as GitHub, Salesforce, and Slack, using user‑based or pre‑authorized credentials to access resources securely.

1.4 Code Interpreter

This component provides an isolated code‑execution sandbox that Agents can invoke for arbitrary computations.

1.5 Browser Tool

A VM‑level sandboxed browser offers full auditability and session‑level isolation, enabling Agents to interact with web content safely.

1.6 AgentCore Gateway

The Gateway converts APIs (REST, GraphQL, etc.) and Lambda functions into Agent‑compatible tools (MCP), simplifying large‑scale tool deployment and discovery.

1.7 AgentCore Observability

Observability integrates OpenTelemetry and CloudWatch to provide tracing, debugging, and performance monitoring for Agents in production.

2. Agentic AI Infrastructure Requirements

2.1 Runtime

Runtime must provide session‑granular isolation, which traditional container runtimes lack; therefore AWS adopts Firecracker microVMs. Runtime VMs need to reside inside VPCs for zero‑trust networking and must support computer‑use scenarios such as virtual desktops and mobile use cases. Observability logs are captured via MCP, with Alibaba Cloud’s AgentBay cited as a comparable implementation.

2.2 Memory

Memory is the most challenging infra component. It must handle massive Context Engineering workloads and multi‑Agent conversation state. AWS’s short‑term/long‑term memory abstraction aligns with the hot/cold storage hierarchy: hot caches (e.g., OSS) for immediate access and vector‑search capable stores (S3 Vectors) for longer‑term retrieval. Multi‑tenant isolation and compute‑storage separation for KVCache are critical, but CXL is not on the GPU Scale‑UP roadmap.

Two possible paths for KVCache support are:

Enable KVCache on Scale‑Out microVMs placed in VPCs.

Enable KVCache on Scale‑Up architectures.

From a GPU perspective, direct LD/ST semantics are preferred to avoid latency from KVCache transfers. AWS currently routes data through a Grace node and then via NVLink C2C to Blackwell, requiring careful congestion control and priority scheduling.

CPU support for UALink is essential for building memory services that can scale across tenants. AMD’s chiplet architecture (e.g., Turin) could replace PCIe lanes with UALink, offering a more efficient inter‑connect for multi‑tenant VM pools.

2.3 Identity & Gateway

The identity and gateway layer functions as a zero‑trust network architecture (ZTNA). Agents act as SDP clients, while the gateway connects inference platforms and internal VPC applications, mirroring traditional SDP enterprise deployments.

3. Model Requirements from a Context‑Engineering Viewpoint

Effective Agentic AI must manage KVCache hit rates and context length limits. Analogous to virtual memory paging, long‑running agents (e.g., quantitative finance agents processing hundreds of stocks) need strategies such as section‑based paging to keep context within model limits. The article references Manus’s work on KVCache design and speculates on the massive context handling used by OpenAI or DeepMind.

4. Conclusion

The piece outlines the hardware (X86 CPUs with UALink), runtime (MicroVM‑based isolation), zero‑trust networking, and layered memory designs required for next‑generation Agentic AI. It aims to spark discussion and provide a reference point for industry practitioners.