Cloud Native 12 min read

Edge Autonomy in TKE@EDGE: Ensuring High Availability of Edge Containers in Weak Network Environments

This article explains how TKE@EDGE implements edge autonomy mechanisms—including lite‑apiserver, network snapshot, and distributed health checks—to keep edge containers and services highly available even when network connectivity to the cloud is unreliable or intermittent.

Cloud Native Technology Community
Cloud Native Technology Community
Cloud Native Technology Community
Edge Autonomy in TKE@EDGE: Ensuring High Availability of Edge Containers in Weak Network Environments

In edge computing scenarios, the network between the cloud control center and edge devices is often complex and unreliable, prompting users to demand high‑availability services even under weak network conditions. The TKE edge container team introduced an edge‑autonomy feature to address this challenge.

Problem Background

Edge devices are numerous and distributed nationwide, using various network types (Internet, Ethernet, 5G, Wi‑Fi), leading to inconsistent network quality. Traditional Kubernetes relies on stable communication with the kube‑apiserver, which is not guaranteed in edge environments.

To ensure business and service high availability on edge clusters, the team designed two solutions, focusing here on the first: edge autonomy.

Requirements Example

A typical factory model illustrates the issues of using standard Kubernetes in weak networks and the solutions proposed by TKE@EDGE.

Nodes must continue running workloads even if disconnected from the master.

Kubelet must restart containers that exit or crash.

Workloads must be relaunched after node reboot.

Microservices within the same factory must remain reachable after a node restart.

These needs challenge standard Kubernetes, which assumes a reliable LAN.

Standard Kubernetes Handling

Disconnected nodes become NotReady or Unknown.

Pods on disconnected nodes are removed from Endpoint lists, making services unavailable.

After a node reboot, pods are not automatically recreated.

Thus, a node’s loss of network connectivity is traditionally equated with service unavailability, which is unacceptable for edge workloads.

Edge Autonomy Features (TKE@EDGE)

Nodes may be marked NotReady/Unknown, but their services stay available.

Multiple disconnected nodes can keep Pods running and micro‑services functional.

After disconnection and reboot, Pods are automatically relaunched and remain operational.

All micro‑services stay reachable across nodes within the same factory.

The team combines this with a distributed node health‑check plugin to keep Pod IPs in the Endpoint list even when nodes are NotReady.

Design Principles

lite‑apiserver Mechanism

A lightweight apiserver runs on each edge node, mirroring kube‑apiserver functionality locally. It forwards requests to the real apiserver when the network is healthy and serves cached data when the network is down, ensuring node components remain functional.

Network Snapshot Mechanism

Using Flannel in VXLAN mode, the solution periodically snapshots network configurations of flannel and Pods, allowing nodes to restore network connectivity after a reboot in a disconnected environment.

DNS Solution

Instead of a single kube‑dns Deployment, a DaemonSet deploys a DNS instance on every node, and kubelet is configured to use the local DNS IP, guaranteeing name resolution even without cloud connectivity.

Overall, the combination of lite‑apiserver, flannel, kube‑dns, and network snapshots ensures reliable networking for edge containers under weak network conditions.

Applicable Scenarios

TKE@EDGE enables users to manage edge nodes via Kubernetes from the cloud, separates control‑plane and workload networks, provides disaster‑recovery capabilities in weak networks, and supports custom traffic routing while staying compatible with upstream Kubernetes (v1.18).

Future Plans

Modularize the edge‑autonomy solution as a plugin for broader Kubernetes use.

Support fully decentralized lite‑apiserver to further aid management in weak‑network environments.

Edge ComputingHigh AvailabilityKubernetesweak networklite-apiserver
Cloud Native Technology Community
Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.