Cloud Native 18 min read

Middleware PaaS on Kubernetes: Architecture, Benefits, and IP Reservation Challenges

This article explains how the New Oriental architecture team migrated middleware services like Redis, Kafka, and RocketMQ to Kubernetes, detailing the benefits over traditional PaaS, the Capo IP reservation solution for network stability, and the resulting operational, observability, and resource utilization improvements.

New Oriental Technology
New Oriental Technology
New Oriental Technology
Middleware PaaS on Kubernetes: Architecture, Benefits, and IP Reservation Challenges

Traditional Operations Challenges and Cloud‑Native Advantages

To reduce cost and increase efficiency, many companies manage middleware services through PaaS platforms. Traditional PaaS directly manipulates VMs or physical machines, which incurs high development cost for resource management, availability, observability, and utilization. Introducing Kubernetes solves these problems, enabling 100% of middleware services (Redis, Kafka, RocketMQ) to run on a Kubernetes environment with second‑level cluster delivery and integrated monitoring/alerting.

Middleware PaaS Product Ecosystem and Features

The architecture team provides a set of PaaS products for Redis, Kafka, and RocketMQ, built on custom Operators and open‑source Operators. These products offer automated lifecycle management, multi‑tenant authentication, high‑availability, fault‑self‑healing, and integrated monitoring (Prometheus, Grafana, Alertmanager) with a proprietary XLSS storage system for stateful workloads.

Kafka PaaS Features

Automated lifecycle management of Kafka clusters

Multi‑tenant authentication and authorization

Support for SCRAM, PLAIN, SSL, PLAINTEXT

Fault self‑healing

Integrated monitoring and alerting via Zookeeper/Kafka exporters

RocketMQ PaaS Features

Automated lifecycle management of RocketMQ clusters

Master‑slave and DLedger high‑availability clusters

ACL support for authentication/authorization

Scalable broker and NameServer instances

Version support for 4.7 and 4.8

Built‑in monitoring and console management

Redis PaaS Features

Automated lifecycle management of Redis clusters

Customizable master count and replica factor

Even distribution of masters and replicas

Custom cluster configuration

Automatic fault recovery

Integrated monitoring and alerting

Middleware Containerization Challenges

In Kubernetes, managing container networking is critical. Calico provides network connectivity, but Pod IPs are dynamically allocated and may change on Pod recreation, risking topology chaos and client reconnection failures. The team created the open‑source Capo project, implementing an "IP reservation with delayed release" strategy to keep IPs stable during short‑term Pod rebuilds.

Zookeeper Issue

ElasticJob uses a containerized Zookeeper service. When all Zookeeper Pod IPs change, Curator fails to reconnect because it resolves the cluster configuration to the new IPs, causing job failures. The solution was to expose the ensembleTracker parameter in ElasticJob, set it to false , and force Curator to connect via the stable Service ClusterIP.

Redis Cluster IP Swap Problem

When multiple Redis Pods are recreated, IP addresses may swap, leading to two problems: (1) Gossip communication may merge distinct clusters into a "big cluster"; (2) Clients continue to use stale IPs, connecting to the wrong cluster and causing retries. The solution leverages Calico's IPReservations resource to reserve IPs during Pod deletion, preventing immediate reuse.

Capo IP Reservation Workflow

Delete or drain request reaches the Kube‑APIServer.

If the Pod belongs to a StatefulSet, the request is forwarded to the Capo webhook.

Capo records the Pod IP in an IPReservations object and stores metadata in a ConfigMap.

The delete request is then allowed to proceed, and Kubelet releases the IP via Calico CNI.

During subsequent Pod recreation, Calico allocates a different IP from the pool, avoiding swaps.

IP Release Mechanism

Capo periodically (every 5 seconds) checks two release conditions: (1) reserved IPs exceed a time threshold; (2) the number of reserved IPs exceeds a count threshold (typically 1.2 × maxPods per node). When conditions are met, Capo removes the IP entries from both the ConfigMap and IPReservations , allowing Calico to reuse them.

Summary

After two years of middleware containerization, the team has deployed over 140 middleware instances, achieving rapid deployment, fine‑grained resource isolation, improved fault tolerance, enhanced security, and high portability. The next article will dive deeper into Capo's design and open‑source journey.

cloud nativeObservabilityKubernetesmiddlewarenetworkPaaS
New Oriental Technology
Written by

New Oriental Technology

Practical internet development experience, tech sharing, knowledge consolidation, and forward-thinking insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.