Implementation and Practice of Karmada-Operator at vivo: Architecture, API Design, and CI/CD
vivo created an Ansible‑based Karmada‑Operator that declaratively manages multi‑cluster deployments, etcd backup/restore, and control‑plane upgrades via custom CRDs and CI pipelines, addressing the limitations of existing tools and providing extensible, reliable, self‑healing orchestration for large‑scale Kubernetes environments.
Background
vivo's Internet server team migrated many services to Kubernetes, leading to rapid growth in cluster scale and number, which increased operational difficulty. After evaluating community projects, they selected Karmada, an open‑source cloud‑native multi‑cloud container orchestration project, for its unified multi‑cluster management, cross‑cluster elasticity, native Kubernetes API usage, disaster recovery capabilities, and extensibility.
Challenges with Existing Tools
The community offers several deployment tools (karmadactl, Karmada charts, binary deployment, hack scripts), but they have drawbacks such as multiple choices, script defects, lack of UI, missing CI testing, insufficient etcd HA features, and complex dependency installation.
Goal
The article shares vivo's practice of building a Karmada‑Operator to address these issues, covering solution selection, API design, architecture, and CI pipeline.
Operator SDK Overview
The Operator Framework provides a way to manage Kubernetes native applications automatically. Operator SDK simplifies development by offering high‑level APIs, scaffolding, code generation, and extensions for common use cases.
Solution Options
Option 1: Go‑based Operator – suitable for Kubernetes‑native stateful services but limited for binary deployments and external etcd.
Option 2: Ansible‑based Operator – supports both Kubernetes‑based and non‑Kubernetes binary deployments, leveraging Ansible’s SSH and K8s modules.
Option 3: Hybrid Go + Ansible – combines capabilities of both.
After evaluation, vivo chose the Ansible‑based Operator (Option 2) because it provides comparable capabilities to the Go SDK, matches production requirements, is easy to learn, and offers strong extensibility.
API Design
The Operator defines CRDs such as KarmadaDeployment , EtcdBackup , and EtcdRestore . The watches.yaml implements the Reconcile logic. These resources allow declarative specification of Karmada deployment, etcd backup, and restore operations.
Architecture
The design supports both containerized and binary deployments. Containerized deployment uses only Kubernetes APIs, while binary deployment relies on SSH to manage the control plane. Member clusters are registered/unregistered via generated Ansible inventory files.
Control‑Plane Management
Standardized certificate management using OpenSSL.
External load‑balancer support for the Karmada API server.
Flexible upgrade strategies (component‑wise or full‑cluster).
Rich global variable definitions for future configuration changes.
etcd Cluster Management
Custom Ansible plugins provide member addition/removal, backup (e.g., to CephFS), restoration, and health checks. Separate CRDs for EtcdBackup and EtcdRestore isolate etcd operations from the main Karmada deployment.
Member Cluster Management
Dynamic inventory plugins generate Ansible inventories from the KarmadaDeployment spec, enabling concurrent registration and deregistration of member clusters via add-member and del-member roles.
CI Pipeline
The CI workflow runs on a self‑hosted GitHub Runner with KubeVirt. The pipeline includes syntax checks (ansible‑lint, shellcheck, yamllint, etc.), cluster deployment tests (various Karmada install methods, join/unjoin, upgrades, etcd backup/restore), functional tests (Karmada e2e, Bookinfo demo), and performance tests (simulating 2000‑node member clusters, measuring failover time for 40k pods).
Summary
The Karmada‑Operator built by vivo demonstrates high extensibility, reliability, and ease of writing operational logic. It provides declarative, self‑healing management for multi‑cluster environments, though it currently lacks webhook support and a sophisticated CRD scaffolding tool. The project is open‑source and invites community contributions.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.