Why K8ssandra Is Switching from Helm to Its Own Operator
The article explains how K8ssandra, an Apache Cassandra distribution for Kubernetes, evolved from using Helm charts to developing a dedicated Operator to overcome Helm's limitations, improve multi‑cluster support, and align more closely with Kubernetes best practices.
K8ssandra is an Apache Cassandra® distribution for Kubernetes built from multiple open‑source components. Until version 1.3 it was installed and managed with Helm charts, even though some parts used Kubernetes Operators such as cass‑operator and medusa‑operator, but no single Operator managed all components.
The K8ssandra team recently decided to create a dedicated Operator for the project. This article shares their Helm experience, the reasons for moving to an Operator, and the expected benefits.
1 背景
The core of K8ssandra is cass‑operator, used to deploy Cassandra nodes. Around it, a set of components forms an ecosystem for running Cassandra effectively on Kubernetes, including tools for managing anti‑entropy repair (Reaper) and backups (Medusa), a Prometheus/Grafana stack for metrics, and Stargate as a data gateway offering REST, GraphQL, and Document APIs.
Initially Helm was used to manage installation and configuration, allowing rapid project start‑up and community building. Early adopters were mostly Cassandra developers with limited Kubernetes expertise, who found Helm easier to learn than Operators and CRDs.
2 进展:Helm 的优缺点
As the project grew, limitations of Helm emerged, especially around upgrades and cluster management.
编写复杂的逻辑
Helm supports control flow with loops and if statements, but deep nesting makes templates hard to read and review.
重用和可扩展性
Helm variables are scoped to the template where they are declared, preventing reuse across templates (e.g., a variable defined in the Cassandra data‑center template cannot be reused in the Stargate template), violating the DRY principle. The helper function library is extensive but does not cover all use cases and lacks a way to define custom functions.
项目结构和继承
The umbrella chart pattern is a Helm best practice, but implementing it caused variable‑scope issues when trying to share authentication settings across sub‑charts for Cassandra, Prometheus, Reaper, and Stargate.
定制资源定义(CRD)管理
Helm can create CRDs but cannot manage them. Updating CRDs across multiple Helm releases required custom Kubernetes jobs marked as pre‑upgrade hooks, essentially writing mini‑controllers and feeling like building an Operator.
临界点:多集群部署
Although version 1.3 solved many Helm issues, the next major feature—multi‑cluster K8ssandra deployments across several Kubernetes clusters—cannot be effectively achieved with Helm alone.
3 设定新方向
The team realized Helm was being over‑used; they needed a more suitable tool for complex operations. They aligned with the Operator framework’s functional model.
Helm is best for the first two Operator capability levels (simple install and upgrade). More complex tasks like fault handling, autoscaling, and advanced installations should be implemented with languages such as Ansible or Go, not Helm templates.
4 Operator 设计和实现的选择
模块化设计
Reaper Operator, Medusa Operator, and Stargate Operator will be merged into a single K8ssandra Operator running in one pod but containing multiple controllers, each corresponding to a CRD. The cass‑operator remains a separate dependency.
基于 Operator SDK 使用 Go 语言开发
The team chose Go with the Operator SDK because of existing familiarity from developing cass‑operator, offering full programming language capabilities and easier creation of reusable helper functions.
K8ssandra 集群级状态
The new K8ssandraCluster CRD includes a status field that aggregates health of all constituent objects (Cassandra, Stargate, Reaper, etc.), something Helm cannot provide.
与 Kubernetes 的方式更加一致
Controllers follow standard Kubernetes resource management patterns, enabling precise startup ordering (e.g., Stargate starts only after Cassandra is ready) via init containers and reconciliation logic.
测试覆盖率
Operator development allows better testing tools (e.g., SonarCloud) compared to Helm templates, though measuring coverage remains a challenge.
5 我们仍在研究的事情
加速迭代开发
Operator development involves more steps (rebuilding images, redeploying, etc.) than Helm, so the team seeks ways to automate and speed up the iteration cycle.
多集群集成测试
Testing multi‑cluster deployments is difficult; the team is evaluating tools like Kuttl, which describe test cases and expected results in YAML, lowering the barrier for contributors.
6 您应该使用 Operator 吗?您应该开发一个 Operator 吗?
If you run databases or infrastructure on Kubernetes, automating operations with an Operator often makes sense. For data‑infrastructure vendors or open‑source contributors, consider building an Operator when existing tools become a hindrance rather than a help.
7 建立社区
The K8ssandra community is growing, with more contributions and issue reports. The team aims to strengthen the contributor base and welcomes developers interested in running Cassandra on Kubernetes or building Operators.
Original link: https://thenewstack.io/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
