Why Cilium Beats Flannel: Real‑World Kubernetes Networking Insights
The article analyzes how Cilium’s eBPF‑based architecture, advanced network policies, cluster‑wide traffic control, and observability tools like Hubble solved performance, security, and scalability challenges that Flannel and kube‑proxy could not meet in production Kubernetes environments.
Background
We build and maintain infrastructure for companies of various sizes, industries, and tech stacks, deploying applications on private clouds, public clouds, and bare‑metal servers. Customers demand fault tolerance, scalability, cost efficiency, and strong security, forcing us to continuously evolve our platform.
When we first built a Kubernetes‑based platform, we chose Flannel (with kube‑proxy) as the CNI because it was mature, low‑dependency, and performed well in our early benchmarks.
Growing Requirements
As the number of customers and clusters grew, we faced increasing demands for better security, higher performance, and richer observability. Specific pain points included:
A financial institution enforcing a strict "default deny all" rule.
A large portal whose many services overwhelmed kube‑proxy.
PCI‑DSS compliance requiring flexible, powerful network‑policy management with strong observability.
Performance degradation in Flannel’s iptables/netfilter stack under heavy inbound traffic.
These constraints pushed us to look for a more capable CNI.
Why Choose Cilium
Among the many CNI options, we wanted to stay with eBPF because of its proven benefits for observability and security. The two leading eBPF‑based projects are Cilium and Calico. Both are excellent, but Cilium enjoys broader community adoption, higher GitHub activity, and CNCF backing, which gave us confidence to try it.
After extensive testing on several clusters, Cilium consistently delivered a superior experience, revealing many features we hadn’t anticipated.
Key Features We Like
1. Performance
Routing with bpfilter (instead of iptables) moves filtering into kernel space, yielding a noticeable performance boost. Our own benchmarks confirmed a significant increase in traffic‑processing speed compared with Flannel + kube‑proxy.
2. Better Network Policies
CiliumNetworkPolicy CRD extends the native Kubernetes NetworkPolicy API, adding L7 support, ingress/egress rules, and port‑range specifications. The goal is eventually to merge these capabilities into the standard API.
3. Cluster‑wide Traffic Control
CiliumClusterwideNetworkPolicy lets you define policies that apply to the whole cluster (outside any namespace), making it easy to control traffic between groups of nodes.
4. Policy Enforcement Modes
The user‑friendly enforcement modes simplify policy management. The default mode allows everything until a rule is added, after which all unspecified traffic is denied. The Always mode enforces policies on every endpoint, useful for high‑security environments.
5. Hubble and UI
Hubble provides powerful network and service observability with a visual UI that shows real‑time traffic flows, service interaction graphs, and policy enforcement details.
6. Visual Policy Editor
The online editor offers a mouse‑friendly UI to create rules and generate the corresponding YAML, though it currently lacks a reverse‑visualization feature for existing configurations.
What Cilium Did for Us
We revisited the concrete problems that drove us to evaluate Cilium. The "default deny all" rule was implemented using the default enforcement mode. Below are a few simple policy examples that many teams find useful.
apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
name: all-pods-to-istio-internal-access
spec:
egress:
- toEndpoints:
- matchLabels:
k8s:io.kubernetes.pod.namespace: infra-istio
toPorts:
- ports:
- port: "8443"
protocol: TCP
endpointSelector: {} apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-ingress-egress-within-namespace
spec:
egress:
- toEndpoints: [{}]
endpointSelector: {}
ingress:
- fromEndpoints: [{}] apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: vmagent-allow-desired-namespace
spec:
egress:
- toEndpoints:
- matchLabels:
k8s:io.kubernetes.pod.namespace: desired-namespace
endpointSelector:
matchLabels:
k8s:io.cilium.k8s.policy.serviceaccount: victoria-metrics-agent-usr
k8s:io.kubernetes.pod.namespace: vmagent-system apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
name: host-firewall-allow-metrics-server-to-kubelet
spec:
ingress:
- fromEndpoints:
- matchLabels:
k8s:io.cilium.k8s.policy.serviceaccount: metrics-server
k8s:io.kubernetes.pod.namespace: my-metrics-namespace
toPorts:
- ports:
- port: "10250"
protocol: TCP
nodeSelector:
matchLabels: {}Our initial challenges were:
Cases #2 and #4 suffered from poor performance of the iptables‑based stack; benchmarks and our own tests proved otherwise.
Hubble provided the level of observability required for case #3.
Next Steps
We have solved all Kubernetes networking pain points we faced. Cilium, now a CNCF incubating project, is slated to graduate soon; it passed two security audits in February 2023. We are watching the roadmap for upcoming features such as stable EndpointSlice support and native local‑redirect policies.
Conclusion
After validating Cilium in production, we adopted it despite its learning curve because its benefits—performance, security, and observability—are evident. For teams willing to invest time and knowledge, Cilium is a 100 % worthwhile experiment that can deliver multi‑dimensional returns.
System Architect Go
Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
