Cloud Computing 23 min read

How AI Is Redefining Cloud Computing: From Scale‑Up to Serverless

The talk explores how the rise of large AI models is transforming cloud computing architecture, workloads, and services—shifting from traditional virtualization to heterogeneous compute, massive scaling, serverless infrastructures, and new networking designs that together enable agile AI‑native applications.

Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
How AI Is Redefining Cloud Computing: From Scale‑Up to Serverless

01

In the AI‑native era cloud computing undergoes fundamental changes. The speaker reviews three decades of evolution across applications, AI techniques, and IT infrastructure, noting the parallel development of desktop, mobile, and cloud platforms and the impact of deep learning since AlexNet.

Since 2006 AWS introduced S3 and EC2, launching the classic cloud era, while cloud‑native practices after 2017 dramatically improved mobile development efficiency.

Large‑model breakthroughs in 2022 merged applications, AI, and infrastructure, prompting a new wave of cloud product and technology innovation.

Key Changes in the AI‑Native Era

Workloads shift from general‑purpose to heterogeneous compute, driving architectural innovation.

Big data and large models cause a surge in compute demand.

Digital transformation accelerates, demanding more agile and innovative applications.

These trends are analyzed in depth.

General‑Purpose Compute Evolution

KVM was merged into the Linux kernel in 2007, OpenStack debuted in 2010, and Open vSwitch entered the kernel in 2012. Heterogeneous compute rose with AlexNet (2012) and accelerated after the 2017 Transformer paper.

Since 2017, innovations such as AWS Nitro (hardware‑offloaded virtualization) and NVIDIA Volta V100 (first Tensor Core) illustrate the move toward soft‑hard co‑design.

Impact of Large Models

Model training compute grew 215× every two years after Transformers, with GPT‑3 requiring 314 ZFLOPs (≈32 years on a single A100) and LLaMA‑65B training in 21 days on 2048 GPUs.

Scaling laws show larger parameters yield better performance, driving the need for massive, high‑speed interconnect clusters and mixed‑precision distributed training.

AI‑Native Application Stack

AI‑native apps consist of code, data, and model. DevOps, DataOps, and MLOps toolchains improve productivity but increase developers’ mental load.

Serverless cloud infrastructure abstracts these toolchains, letting developers focus on business logic.

Three Core Cloud Technologies

Scale‑up through soft‑hard co‑design of architectures.

Scale‑out via high‑speed interconnects for distributed compute.

Full serverless cloud products delivering elastic, plug‑and‑play experiences.

Scale‑up and scale‑out provide elastic compute, the foundation for serverless services that accelerate application innovation.

Hardware Advances

GPU evolution: V100 (Volta, 2017) introduced Tensor Cores; A100 (Ampere) doubled performance with BF16 and Multi‑Instance GPU; H100 (Hopper) added FP8, NVLink‑3, and SHARP for in‑network aggregation, boosting bandwidth by ~30%.

Distributed Training Strategies

Optimal strategies combine data parallelism, pipeline parallelism, tensor parallelism, and parameter‑group slicing, chosen based on model compute/communication cost models and cluster topology.

Fault tolerance, checkpoint optimization, and fast I/O raise effective training time to 95% of theoretical capacity.

Serverless Evolution

Serverless abstracts infrastructure for compute, storage, and increasingly AI services, reducing developer overhead.

Modern data stacks (e.g., Snowflake, Aurora) adopt storage‑compute separation, leveraging shared storage, RDMA acceleration, and local SSD caching to mitigate performance loss.

AI‑Native Application Stack Layers

Data: high‑quality domain data.

Model: fine‑tuned large models.

Prompt: optimal prompt engineering.

Chain: orchestrated model calls with memory components.

Agents: autonomous chains for complex tasks.

Baidu’s Qianfan platform integrates these layers, offering data management, model training (including RLHF), evaluation, deployment, and plugins for AI‑native workloads.

Essence of Cloud Computing

Beyond elasticity, the core essence for providers is scale: scaling compute to meet large‑model demands, enabling serverless platforms, and driving agile innovation.

serverlesscloud computingDistributed TrainingHardware AccelerationAI-native
Baidu Intelligent Cloud Tech Hub
Written by

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.