Design and Practice of Multi‑Cluster Management in SOFAStack CAFE Using an Extended KubeFed Framework
This article details the architectural background, challenges, and practical solutions implemented in SOFAStack CAFE for cloud‑native multi‑cluster deployment, including a custom multi‑topology CRD, an independent federation API server, enhanced KubeFed controller features, and network proxy integration to support hybrid‑cloud scenarios.
Background SOFAStack is Ant Group's commercial, financial‑grade cloud‑native architecture product that enables rapid building of cloud‑native microservice systems with high reliability, scalability, and maintainability, offering data‑center‑level disaster recovery and multi‑data‑center capacity expansion.
At the application lifecycle level, SOFAStack provides the CAFE (Cloud Application Fabric Engine) PaaS platform for full‑lifecycle management, facilitating smooth migration from traditional to cloud‑native architectures in financial scenarios.
For cloud‑native operations, CAFE leverages the LHC (LDC Hybrid Cloud) product to achieve multi‑region, multi‑data‑center, hybrid‑cloud deployments, and the article explores the underlying Kubernetes multi‑cluster practices.
Challenges Selecting a suitable Kubernetes multi‑cluster framework led to the initial choice of KubeFed, but its basic capabilities lacked the “unit” concept required by SOFAStack’s deployment‑unit model, causing conflicts in topology, tenant isolation, annotation propagation, and network connectivity.
Practices
Multi‑Topology Federation CRD The original KubeFed CRD was extended to support a topology‑type field (e.g., cell ) allowing resources to be distributed by deployment unit rather than by cluster.
Independent Federation‑Layer API Server An isolated API server was introduced to convert the custom multi‑topology CRD to the native KubeFed CRD via a Conversion Webhook, keeping the federation data separate and avoiding heavy modifications to the upstream controller.
MySQL/OB as etcd Backend The Kine adapter was integrated so the federation API server can use MySQL or OceanBase as the etcd store, reducing operational cost and leveraging existing HA capabilities.
KubeFed Controller Enhancements
Tenant‑level isolation by injecting well‑known labels into KubeFedCluster objects.
Gray‑release support via a placementMask that limits updates to selected deployment units.
Custom annotation propagation configuration allowing selective spec‑type annotation distribution while preserving status‑type annotations.
API Server Network Proxy Integration A multi‑cluster‑aware extension of the ANP was added to establish reverse long‑lived connections, eliminating the need for direct network connectivity between the federation controller and each member cluster.
Summary The extended federation layer provides a multi‑topology model, tenant isolation, fine‑grained gray‑release, custom annotation handling, KMS‑encrypted credentials, and MySQL/OB backend, all while operating independently of any Kubernetes cluster and supporting hybrid‑cloud network constraints.
SOFAStack CAFE is already deployed in over 50 financial institutions, and future plans include dynamic multi‑cluster scheduling, HPA, API proxy capabilities, and lightweight CRDs based on native resources.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.