Cloud Native 20 min read

Middleware Containerization and Cloud‑Native Transformation at OPPO

OPPO transformed its sprawling, manually‑provisioned middleware clusters into a cloud‑native, containerized platform by building custom Kubernetes controllers, IP‑preserving StatefulSets, resource‑isolated containers, automated monitoring and self‑healing workflows, enabling rapid provisioning, efficient utilization, fault‑tolerant scaling and future serverless and service‑mesh integration.

OPPO Kernel Craftsman

Aug 30, 2024

Middleware Containerization and Cloud‑Native Transformation at OPPO

Background : OPPO, a global technology company, has rapidly expanded its business lines, leading to a massive increase in middleware clusters. This growth has caused numerous issues in middleware deployment, usage, and operation.

Key Problems :

Middleware and business applications share physical machines, making resource isolation difficult.

Cluster provisioning cycles are long, with heavy manual intervention and complex interaction processes.

Overall CPU, memory, and disk utilization of middleware clusters is low.

Quantifying middleware resource consumption for cost accounting is hard.

Stability is insufficient; fault recovery is slow and cumbersome.

Traditional physical mechanisms cannot provide strong resource elasticity.

Pre‑containerization workflow : Business teams request middleware clusters via email to operations, which then coordinate with a resource interaction group to allocate physical machines. Operators manually install agents on the machines, input machine lists into a PaaS platform, and finally deploy middleware through those agents.

Containerization challenges :

How to keep state (fixed IP and data) when a stateful middleware container fails and is rescheduled.

IO‑intensive middleware needs proper IO isolation.

Detecting performance or low‑utilization issues quickly and performing horizontal/vertical scaling (HPA, VPA).

Achieving fault‑self‑healing that is transparent to users and does not impact business.

Future cloud‑native evolution.

State preservation : Currently, OPPO uses local disks with FlexVolume+LVM; shared block/file storage is being introduced. When a host crashes, IP can be retained but data is lost, so high‑availability mechanisms (master‑slave) are required at the middleware layer. Custom components such as LocationController, StatefulSetController, ResitorController, NetHouse, an extended Scheduler, and modifications to Kubelet and NodeController have been developed.

Kubelet reads pod annotations to decide whether a fixed IP is needed. When creating a sandbox container, it adds a label with the reserved IP. Upon pod deletion, the IP reservation is released via the NetHouse API.

The SchedulerExtender fetches cached Location CRDs. If a pod requests IP preservation, the extender checks for an existing Location CRD; if found, it returns nodes that satisfy the stored networkzone. Otherwise, it returns all suitable nodes. The controller creates/updates the Location CRD accordingly.

LocationController watches add/update/delete events. If the deletionTimestamp is set and the associated pod is terminated or pending, it releases the reserved IP via NetHouse, removes the finalizer, and lets the API server delete the CRD. Failures are re‑queued.

Container isolation :

Resource isolation : Uses Linux Cgroup V1 (CPU, memory, network). Cgroup V1 can limit IOPS for Direct IO but not for Buffered IO, which many middleware (Kafka, RocketMQ, Elasticsearch) rely on. OPPO currently separates resource pools (general, memory‑intensive, IO‑intensive) and plans to adopt Kata containers for stronger isolation.

View isolation : Inside a container, commands like top read host‑level data from /proc. OPPO mounts cgroup‑derived resource files into the container’s /proc using lxcfs (a FUSE‑based filesystem) so that tools report the container’s actual limits.

Post‑containerization workflow : Business selects machine specs, storage type, and instance count via the OPPO cloud portal. An admission webhook checks quota and resource availability before creating the Kubernetes resources. Custom middleware images contain the installation package, agent, and bastion. The middleware service calls back a provided URL for each successful container creation, ensuring idempotent operations. If deployment fails after quota verification, retry logic is invoked. Successful clusters automatically set recommended monitoring thresholds and integrate with the middleware monitoring system.

Monitoring & alerting architecture :

Metadata Service – receives cluster metadata and alert thresholds.

Collector – periodically pulls metadata, generates monitoring tasks, and pushes them to RocketMQ; tasks are processed by a high‑performance DataFlow engine.

Processor – evaluates thresholds, sends alerts (email, SMS), and sinks results to Elasticsearch, Redis, MySQL, and a custom time‑series DB.

Query Service – provides dashboards for middleware, containers, and hosts.

Fault self‑healing platform : Monitors metrics, classifies events (healing, HPA, VPA), and invokes middleware‑specific APIs to restart instances, rebuild fixed IPs, or scale horizontally/vertically. Fault detection uses a 30‑second collection interval; three consecutive failures (≈90 s) trigger healing. For stateful services (ZK, ES, MQ, Redis), data is restored via master‑slave synchronization after IP reconstruction.

Domain‑based access : Middleware services are fully domain‑ified. For example, a Redis Sentinel cluster is accessed via a dedicated domain name. If a host fails, the platform rebuilds the original IP, and high‑availability mechanisms ensure data continuity.

Cloud‑native transformation :

Operator model : Encapsulates domain knowledge in CRDs and Controllers. Controllers use Informers and workqueues to reconcile desired vs. actual state.

Unified kernel : Plans to abstract common middleware functions (e.g., messaging) into a shared kernel.

Storage separation : Prefer shared file storage; adapt middleware to use Direct IO where necessary.

Serverless : Auto‑scale logical resources (queues, partitions) and physical resources (CPU, memory, disk) on demand.

Service Mesh : Sidecar‑based mesh provides service discovery, traffic management, observability, and fault injection for middleware.

Operator implementation details : A Controller watches CRD events, uses Informers to cache resources, places changed objects into a workqueue, and processes them with client‑go to drive the cluster toward the desired state.

Example deployment : Deploy a 3‑shard RedisCluster using StatefulSets with pod anti‑affinity, a Scheduler‑extender to limit master placement, ConfigMaps for configuration, Secrets for passwords, sidecar exporters for per‑instance metrics, and an additional pod for cluster‑level monitoring. All services are exposed via cluster and pod domain names registered in the internal DNS.

Monitoring changes with Operator : After an Operator creates a cluster, instance IPs and domain names are stored in the middleware management platform, which forwards them to a unified monitoring system. The system periodically scrapes metrics, stores them in a time‑series DB, and triggers alerts based on thresholds.

Future outlook :

Unified kernel for heterogeneous middleware.

Further storage separation and shared‑file‑system adoption.

Serverless execution model for on‑demand scaling.

Full Service Mesh integration for transparent networking and observability.

Additional reference articles are listed at the end of the original document.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring cloud-native Kubernetes Operator containerization fault-tolerance

Written by

OPPO Kernel Craftsman

Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.