Cloud Native 14 min read

Baidu Jarvis2.0: Building One of the Industry’s Most Complex Microservice Systems with Cloud‑Native Multi‑Runtime Architecture

The article details how Baidu’s Jarvis2.0 platform evolved from a few modules to over a thousand microservices, introducing a cloud‑native multi‑runtime architecture, automated deployment pipelines, and an xDS‑based control plane that dramatically improve deployment speed, governance efficiency, and system stability across dozens of product lines.

DataFunTalk
DataFunTalk
DataFunTalk
Baidu Jarvis2.0: Building One of the Industry’s Most Complex Microservice Systems with Cloud‑Native Multi‑Runtime Architecture

Baidu’s advertising platform needed to scale from dozens of modules to more than 1,000 microservices, prompting the creation of Jarvis2.0—a cloud‑native control center that unifies service governance, configuration management, tracing, capacity planning, high availability, and data‑driven operations.

Jarvis2.0 aims to provide self‑service tools for the entire microservice lifecycle, consolidating over ten governance components (JVMTI probes, Launchers, traffic recording, log collection, diagnostics, etc.) and starter kits (ConfigStarter, StarlightStarter, JdbcStarter, ActuatorStarter, RedisStarter, …) while supporting a multi‑runtime architecture.

The multi‑runtime design introduces the Moonlight sidecar runtime, separating data‑plane capabilities from the application runtime; Moonlight is compiled with GraalVM native image, achieving sub‑5‑second startup and a 128 MiB memory footprint.

On the deployment side, Jarvis2.0 leverages Elastic Kubernetes Service (EKS), OpenKruise CloneSet/Rollout, and KubeVela OAM to automate Docker image layering for SpringBoot, static web, Node, and Go applications, enable multi‑cluster gray‑scale releases, and perform sidecar injection and in‑place upgrades.

The Gravity control plane integrates registration, configuration, and routing using the xDS protocol, delivering zero‑downtime service discovery, automatic unhealthy‑node removal, and dynamic governance features such as rate‑limiting, circuit‑breaking, log‑level changes, and runtime diagnostics.

Jarvis2.0 now serves 60+ product lines and over 4 w + instances (200 w + CPU cores), cutting full‑version deployment time from 126 minutes to 36 minutes, reducing configuration hot‑updates from 30 minutes to 1 minute, and markedly improving stability, scalability, and multi‑language web‑service coverage.

cloud nativemicroservicesplatform engineeringKubernetesService Meshdeployment automation
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.