How a Unified White‑Screen Ops Platform Transformed Multi‑Cloud Middleware Management
This article details the challenges of traditional middleware operations, explains how Kubernetes and Operators were leveraged to build a unified, visual, and automated platform that standardizes, automates, and visualizes multi‑cloud resource management, and reports the significant efficiency, cost, and safety gains achieved across dozens of clusters.
Project Background
Traditional middleware operations suffered from scattered management tools, high operational costs, and reliance on opaque command‑line scripts ("black‑screen" operations). The team identified Kubernetes and its Operator framework as a way to provide a unified, declarative, and automated management layer.
Why Kubernetes & Operator?
Standardization: Operations can be expressed as Custom Resources (CR) and handled uniformly.
Automation: Reduces manual steps and human error.
Visualization: A UI can drive the underlying Kubernetes actions, lowering operational complexity.
Core Goals of the Platform
Standardization: Consolidate middleware operational procedures into reusable best‑practice workflows.
Automation: Eliminate dependence on manual scripts and enable end‑to‑end automated actions.
Visualization: Provide a white‑screen UI that makes complex tasks intuitive.
Architecture Overview
The platform consists of several layers:
Multi‑cloud Management Service: Unified hosting of Kubernetes clusters from different cloud providers, offering resource visualization and cross‑cloud scheduling.
Middleware Operations Service: Centralized deployment, scaling, and management of Kafka and Elasticsearch, with a visual interface to reduce SRE effort.
K8s Generic Resource Service: Unified handling of Nodes (labeling, taint management), PersistentVolumes, PVCs, Services, Pods, and CPU Burst, all via CRs.
YAML Management Service: Versioned YAML storage, change audit, and visual diff/rollback capabilities.
Operation Audit Service: Detailed logging of every platform action, integrated with DCheck for compliance checks.
Multi‑Cloud Management
Operators abstract away the need to switch kubeconfig files. Users can manage dozens of clusters from a single UI, avoiding the "kubeconfig switching hell".
Kafka Expansion – From Black‑Screen Script to White‑Screen UI
Traditional script example (simplified):
#!/bin/bash
export KUBECONFIG=/path/to/kubeconfig
kubectl get kafka -n kafka-namespace
kubectl patch kafka my-cluster -n kafka-namespace --type='merge' -p '{"spec":{"kafka":{"replicas":5}}}'
# loop to check pod status …
curl -X POST "http://cruise-control…/rebalance" -d "dryrun=false"
# wait for migration …
echo "Kafka expansion completed!"The platform replaces this with a one‑click UI where the operator sets the desired replica count, and the system automatically patches the CR, monitors pod readiness, triggers Cruise‑Control data migration, and records the whole process for audit.
Node Management – From Manual Scripts to Visual Dashboard
Legacy Java‑based script scanned each node, parsed CPU, memory, disk type, and applied labels manually, which was error‑prone and slow (often >1 hour). The new service provides:
Real‑time visualization of node metrics (CPU, memory, disk, labels, taints).
Multi‑dimensional filtering (labels, taints, resources, zones).
Batch labeling and taint management via UI, reducing a 1‑hour task to ~3 minutes.
PV & Cloud Disk Management
When a middleware cluster is deleted, its PersistentVolumes remain, leaving orphaned cloud disks that cannot be traced back to owners. The platform introduces:
Visualization of PV‑to‑cloud‑disk mappings.
Automated detection of idle disks.
One‑click release of cloud disks, cutting release time from >15 minutes to ~1 minute and saving >15 万元 per month.
CPU Burst Management
During traffic spikes, CPU usage can hit 100 % and cause pod eviction. The platform’s CPU Burst feature temporarily lifts CPU limits for critical pods, providing an emergency power source that keeps services alive during high‑load events. It is already enabled in >10 Kubernetes clusters and >30 Elasticsearch clusters.
YAML Management Service
YAML files are the source of truth for Kubernetes resources, but manual edits are risky. The service offers:
Version control with add/modify/rollback and diff capabilities.
Full audit trails for every change.
Visual editor to reduce syntax errors.
Project Outcomes
After three development phases, the platform supports:
Standardized operations for Kafka, Elasticsearch, Node, PV, PVC, Service, and Pod.
Automation of >430 white‑screen operations across 300+ middleware clusters.
Node labeling time reduced from >1 hour to 3 minutes; PV release time reduced from >15 minutes to 1 minute.
Release of 675+ idle cloud disks, saving >15 万元 monthly.
Audit logs exceeding 1 020 entries, with compliance checks via DCheck.
Scalable architecture that can incorporate new resources (Deployments, StatefulSets, Ingress, ConfigMaps, Secrets, custom resources like DMQ, Pulsar, ZK).
Experience & Reflections
Key lessons include the importance of standardization, tightly coupling tooling with processes, and embedding audit/compliance into every operation. Challenges remain in integrating with other platforms (e.g., KubeOne) and expanding test coverage for new scenarios.
Future Outlook
The team plans to extend white‑screen support to more Kubernetes resources, introduce AI‑driven fault‑auto‑healing, improve multi‑cloud integration, and continuously refine the user experience based on feedback.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
