Middleware PaaS on Kubernetes: Architecture, Benefits, and IP Reservation Challenges
This article explains how the New Oriental architecture team migrated middleware services like Redis, Kafka, and RocketMQ to Kubernetes, detailing the benefits over traditional PaaS, the Capo IP reservation solution for network stability, and the resulting operational, observability, and resource utilization improvements.
Traditional Operations Challenges and Cloud‑Native Advantages
To reduce cost and increase efficiency, many companies manage middleware services through PaaS platforms. Traditional PaaS directly manipulates VMs or physical machines, which incurs high development cost for resource management, availability, observability, and utilization. Introducing Kubernetes solves these problems, enabling 100% of middleware services (Redis, Kafka, RocketMQ) to run on a Kubernetes environment with second‑level cluster delivery and integrated monitoring/alerting.
Middleware PaaS Product Ecosystem and Features
The architecture team provides a set of PaaS products for Redis, Kafka, and RocketMQ, built on custom Operators and open‑source Operators. These products offer automated lifecycle management, multi‑tenant authentication, high‑availability, fault‑self‑healing, and integrated monitoring (Prometheus, Grafana, Alertmanager) with a proprietary XLSS storage system for stateful workloads.
Kafka PaaS Features
Automated lifecycle management of Kafka clusters
Multi‑tenant authentication and authorization
Support for SCRAM, PLAIN, SSL, PLAINTEXT
Fault self‑healing
Integrated monitoring and alerting via Zookeeper/Kafka exporters
RocketMQ PaaS Features
Automated lifecycle management of RocketMQ clusters
Master‑slave and DLedger high‑availability clusters
ACL support for authentication/authorization
Scalable broker and NameServer instances
Version support for 4.7 and 4.8
Built‑in monitoring and console management
Redis PaaS Features
Automated lifecycle management of Redis clusters
Customizable master count and replica factor
Even distribution of masters and replicas
Custom cluster configuration
Automatic fault recovery
Integrated monitoring and alerting
Middleware Containerization Challenges
In Kubernetes, managing container networking is critical. Calico provides network connectivity, but Pod IPs are dynamically allocated and may change on Pod recreation, risking topology chaos and client reconnection failures. The team created the open‑source Capo project, implementing an "IP reservation with delayed release" strategy to keep IPs stable during short‑term Pod rebuilds.
Zookeeper Issue
ElasticJob uses a containerized Zookeeper service. When all Zookeeper Pod IPs change, Curator fails to reconnect because it resolves the cluster configuration to the new IPs, causing job failures. The solution was to expose the ensembleTracker parameter in ElasticJob, set it to false , and force Curator to connect via the stable Service ClusterIP.
Redis Cluster IP Swap Problem
When multiple Redis Pods are recreated, IP addresses may swap, leading to two problems: (1) Gossip communication may merge distinct clusters into a "big cluster"; (2) Clients continue to use stale IPs, connecting to the wrong cluster and causing retries. The solution leverages Calico's IPReservations resource to reserve IPs during Pod deletion, preventing immediate reuse.
Capo IP Reservation Workflow
Delete or drain request reaches the Kube‑APIServer.
If the Pod belongs to a StatefulSet, the request is forwarded to the Capo webhook.
Capo records the Pod IP in an IPReservations object and stores metadata in a ConfigMap.
The delete request is then allowed to proceed, and Kubelet releases the IP via Calico CNI.
During subsequent Pod recreation, Calico allocates a different IP from the pool, avoiding swaps.
IP Release Mechanism
Capo periodically (every 5 seconds) checks two release conditions: (1) reserved IPs exceed a time threshold; (2) the number of reserved IPs exceeds a count threshold (typically 1.2 × maxPods per node). When conditions are met, Capo removes the IP entries from both the ConfigMap and IPReservations , allowing Calico to reuse them.
Summary
After two years of middleware containerization, the team has deployed over 140 middleware instances, achieving rapid deployment, fine‑grained resource isolation, improved fault tolerance, enhanced security, and high portability. The next article will dive deeper into Capo's design and open‑source journey.
New Oriental Technology
Practical internet development experience, tech sharing, knowledge consolidation, and forward-thinking insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.