Cloud Native 25 min read

How to Build a Cloud‑Native Streaming Compute PaaS on Kubernetes

This article examines the growing demand for real‑time data processing, outlines the high development, operational, and scalability challenges of traditional streaming systems, and presents a Kubernetes‑based cloud‑native PaaS solution that automates resource management, provides configuration‑driven development, and delivers observable, elastic, and service‑oriented streaming capabilities.

Baidu Tech Salon
Baidu Tech Salon
Baidu Tech Salon
How to Build a Cloud‑Native Streaming Compute PaaS on Kubernetes

Background

Real‑time data processing needs are exploding, but traditional streaming architectures suffer from high development complexity, costly operations, and poor scalability. The article uses a concrete business scenario to illustrate these pain points and proposes a cloud‑native, Kubernetes‑based PaaS platform to encapsulate low‑level resource, self‑healing, and state‑management complexities into automated services.

Streaming Compute Overview

Streaming compute processes unbounded data streams continuously, unlike batch processing which stores data first. It enables near‑real‑time results, turning post‑mortem analysis into in‑process intervention, prediction, and control for use cases such as monitoring, risk control, and recommendation.

Challenges of Traditional Streaming

High development barrier: Developers must master event‑time handling, window mechanisms, and state management.

High operational cost: Fault tolerance, monitoring, and performance tuning require extensive manpower.

Poor scalability: Tight coupling of compute logic with resources makes rapid business iteration and elastic scaling difficult.

Cloud‑Native PaaS Solution

The solution fuses streaming compute with cloud‑native principles, building a PaaS that abstracts complex concepts (event time, windows, state) behind declarative configuration. Users declare data sources, processing logic, and output targets, and the platform generates runnable jobs, dramatically lowering the entry threshold.

Platform Architecture

The platform consists of four collaborative layers:

Hardware Resource Layer: Multi‑region, multi‑datacenter server clusters provide scalable compute capacity and disaster‑recovery.

Kubernetes Orchestration Layer: K8s master and nodes handle resource scheduling, task orchestration, and elastic scaling.

Containerized Engine Layer: Pods run the internally developed streaming framework (TM) as containerized operators, enabling horizontal scaling and environment consistency.

Observability Layer: Integrated Prometheus, Grafana, and Jaeger collect metrics, logs, and traces for full‑stack visibility.

Kubernetes Orchestration Layer

Kubernetes acts as the intelligent brain of the platform. User‑submitted task specifications (CPU, memory, etc.) are translated into custom resources. The platform’s streaming‑task operators watch these resources and drive execution.

Declarative deployment & self‑healing: Tasks are materialized as Deployments (stateless) or StatefulSets (stateful). Failed Pods are automatically recreated within seconds.

Efficient operations & elastic scaling: Adjusting a replica count in the configuration triggers automatic scaling, providing minute‑level rollout and rollback.

Resource isolation & utilization: Namespaces and ResourceQuotas isolate teams, while K8s bin‑packing maximizes cluster efficiency.

Containerized Engine Layer

Containers eliminate environment drift. Standardized base images embed runtime, monitoring agents, and log collectors, ensuring identical development, testing, and production environments.

Unified image standards: All streaming jobs share a common image with pre‑installed dependencies.

Sidecar pattern: Each Pod runs a main container for the operator and a sidecar for logging, metrics, and hot‑config updates.

Resource limits: Precise CPU/memory control via resources.requests/limits prevents noisy‑neighbor issues.

Observability Layer

The platform builds a three‑dimensional monitoring stack:

Metrics (Prometheus): Automatic collection of records/s, process_latency, back‑pressure flags, CPU/memory, visualized in Grafana dashboards.

Logs (Elasticsearch): Unified collection of container stdout/stderr enables rapid root‑cause analysis.

Traces (Jaeger): End‑to‑end tracing of data through the DAG reveals performance bottlenecks.

Configuration‑Driven Development

The platform shifts development from imperative code to declarative configuration, letting users specify “what” instead of “how”.

Imperative vs Declarative: Traditional code requires detailed implementation of event‑time handling, windowing, and state; the new model lets the engine handle these details.

Operator library: Import, Map/Filter, Aggregate, Sink, and Checkpoint operators are exposed as configurable blocks; users assemble pipelines via UI or SQL without writing Java/Scala.

Time & fault‑tolerance layer: Watermark progress and Exactly‑Once state management are fully managed; users only set storage paths.

Practical Use Case: Push Business

The real‑time Push service was migrated to the platform, shortening the development‑test‑deploy cycle, reducing environment‑related friction, and enabling rapid iteration of business logic. This demonstrated tangible cost savings and faster time‑to‑market.

Benefits

Significant reduction in development effort and manpower.

Improved operational efficiency and system stability through standardized templates.

Optimized resource utilization via declarative scaling.

Accelerated business agility; simple configuration changes replace lengthy release processes.

Future Outlook

Elastic intelligence: Leverage fine‑grained metrics to drive custom HPA policies for cost‑effective scaling.

Autonomous operations: Apply RAG and large‑model techniques to build self‑healing operational agents.

Serverless experience: Evolve toward a streaming‑compute FaaS where users submit only a function or SQL and the platform handles all provisioning and lifecycle management.

KubernetesStreamingPaaS
Baidu Tech Salon
Written by

Baidu Tech Salon

Baidu Tech Salon, organized by Baidu's Technology Management Department, is a monthly offline event that shares cutting‑edge tech trends from Baidu and the industry, providing a free platform for mid‑to‑senior engineers to exchange ideas.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.