Tagged articles
4047 articles
Page 13 of 41
Bilibili Tech
Bilibili Tech
Mar 12, 2024 · Cloud Native

Containerizing Elasticsearch and ClickHouse on Kubernetes: Architecture, Implementation, and Benefits

Bilibili migrated its Elasticsearch and ClickHouse clusters to Kubernetes using custom operators, CRDs, LVM‑based local storage, MacVLAN networking, and pod anti‑affinity, achieving higher resource utilization, isolation, and automation that cut server count, reduced latency spikes, and lowered operational costs dramatically.

ElasticsearchKubernetesLVM
0 likes · 38 min read
Containerizing Elasticsearch and ClickHouse on Kubernetes: Architecture, Implementation, and Benefits
ITPUB
ITPUB
Mar 11, 2024 · Cloud Computing

What 4 Years of Startup Infrastructure Taught Me: AWS, Terraform, GitOps & More

After four years running infrastructure at a fast‑growing startup, the author reviews almost every major decision—from choosing AWS over GCP and adopting EKS, RDS, and Redis, to automating post‑mortems with Slack bots, standardising IaC with Terraform and GitOps, and evaluating SaaS tools like DataDog, PagerDuty, and Notion—highlighting the benefits, regrets, and practical lessons learned.

AWSDevOpsInfrastructure
0 likes · 22 min read
What 4 Years of Startup Infrastructure Taught Me: AWS, Terraform, GitOps & More
Cloud Native Technology Community
Cloud Native Technology Community
Mar 11, 2024 · Cloud Native

Harnessing Nvidia GPUs in Kubernetes: Virtualization, Scheduling & Best Practices

This article explains how to combine Nvidia GPUs with Kubernetes, covering CUDA toolkits, device plugins, GPU virtualization techniques such as Time‑Slicing, MPS and MIG, and advanced scheduling options like Volcano, while also outlining practical deployment steps and performance considerations.

Cloud NativeDevice PluginGPU virtualization
0 likes · 22 min read
Harnessing Nvidia GPUs in Kubernetes: Virtualization, Scheduling & Best Practices
Open Source Linux
Open Source Linux
Mar 11, 2024 · Big Data

Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes

This tutorial explains how to install and configure Apache Flink in three deployment modes—Standalone, Hadoop YARN, and Kubernetes—covering node preparation, configuration files, package distribution, job submission, and monitoring through the Flink Web UI, with full command‑line examples and code snippets.

Big DataFlinkKubernetes
0 likes · 12 min read
Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes
dbaplus Community
dbaplus Community
Mar 10, 2024 · Cloud Native

How We Built Elastic Scaling and Hybrid‑Cloud Auto‑Scaling on Kubernetes

After fully containerizing their platform, the team tackled front‑line development scaling challenges by designing a custom elastic‑scaling solution that combines dual‑threshold and timed scaling, integrates hybrid‑cloud ClusterAutoScale, consolidates middleware resources, and implements a comprehensive K8s observability stack, delivering over 30% additional compute capacity and near‑perfect scaling reliability.

Auto ScalingCloud NativeKubernetes
0 likes · 12 min read
How We Built Elastic Scaling and Hybrid‑Cloud Auto‑Scaling on Kubernetes
Ops Development Stories
Ops Development Stories
Mar 8, 2024 · Cloud Native

How to Deploy and Test a Multi‑Cluster Istio Service Mesh with Kind and MetalLB

This guide explains why multi‑cluster deployments are needed for high‑availability, describes Istio's flat and non‑flat network models with single or multiple control planes, and provides step‑by‑step scripts to create Kind clusters, install MetalLB, configure root CAs, deploy Istio, set up gateways, and verify regional load balancing and failover.

IstioKubernetesMetalLB
0 likes · 29 min read
How to Deploy and Test a Multi‑Cluster Istio Service Mesh with Kind and MetalLB
360 Quality & Efficiency
360 Quality & Efficiency
Mar 8, 2024 · Cloud Native

Understanding maxUnavailable and maxSurge in Kubernetes Rolling Updates

This article explains the roles of maxUnavailable and maxSurge in Kubernetes rolling updates, demonstrates their impact on update speed and service stability through multiple practical cases, and provides best‑practice guidelines for configuring these parameters to achieve smooth, reliable deployments.

DeploymentKubernetesRolling Update
0 likes · 12 min read
Understanding maxUnavailable and maxSurge in Kubernetes Rolling Updates
Architect
Architect
Mar 7, 2024 · Cloud Native

Graceful Shutdown in Kubernetes: Concepts, Case Studies, and Optimizations

This article explains the concept of graceful shutdown, outlines the standard steps, and presents detailed Kubernetes, Spring Boot, and Nacos case studies, followed by optimization techniques, code examples, and practical recommendations for handling MQ, scheduled tasks, and traffic control during service termination.

Cloud NativeGraceful ShutdownKubernetes
0 likes · 12 min read
Graceful Shutdown in Kubernetes: Concepts, Case Studies, and Optimizations
Ops Development & AI Practice
Ops Development & AI Practice
Mar 5, 2024 · Cloud Computing

Build a Cost‑Effective Kubernetes Dev Environment with GitHub Codespaces, DinD & KinD

This guide shows how to combine GitHub Codespaces, Docker‑in‑Docker, and KinD to create a fully functional, cloud‑based Kubernetes development environment that eliminates local setup, improves accessibility, and reduces costs, while providing step‑by‑step instructions and essential commands for deployment and testing.

DevOpsDocker-in-DockerGitHub Codespaces
0 likes · 6 min read
Build a Cost‑Effective Kubernetes Dev Environment with GitHub Codespaces, DinD & KinD
MaGe Linux Operations
MaGe Linux Operations
Mar 5, 2024 · Cloud Native

Master Kubernetes Scheduling: 15 Real-World Scenarios & Configurations

This guide explores fifteen practical Kubernetes scheduling scenarios—from basic node selectors to custom schedulers and pod priority—providing detailed YAML configurations, code snippets, and best‑practice recommendations to help you optimize resource utilization, high availability, and workload placement across your cluster.

KubernetesNodeSelectorPodAffinity
0 likes · 12 min read
Master Kubernetes Scheduling: 15 Real-World Scenarios & Configurations
MaGe Linux Operations
MaGe Linux Operations
Mar 5, 2024 · Cloud Native

How to Run GPU‑Accelerated AI Workloads on Kubernetes

This article explains how Kubernetes supports GPU workloads for AI and machine learning, covering device plugins, pod GPU requests, oversubscription, security isolation, cloud‑provider node setup, and protecting GPU nodes from non‑GPU pods.

AI workloadsCloud NativeDevice Plugin
0 likes · 8 min read
How to Run GPU‑Accelerated AI Workloads on Kubernetes
MaGe Linux Operations
MaGe Linux Operations
Mar 2, 2024 · Operations

How to Diagnose and Fix Constant Kubernetes Pod Restarts (OOM)

When a Kubernetes pod repeatedly restarts, you can pinpoint the cause by inspecting events, describing the pod, and checking previous logs, then identify OOM kills caused by memory limits in the deployment.yaml and resolve it by increasing the memory limit and redeploying the pod.

KubernetesMemory LimitsOOM
0 likes · 3 min read
How to Diagnose and Fix Constant Kubernetes Pod Restarts (OOM)
MaGe Linux Operations
MaGe Linux Operations
Feb 29, 2024 · Operations

Quickly Set Up OpenTelemetry on Kubernetes: Installation, Modes & Config

This guide walks you through deploying OpenTelemetry in Kubernetes, covering the purpose of otel‑collector, installation via manifests or Helm, the three deployment patterns (No‑Collector, Agent, Gateway), running the otel‑demo, and detailed configuration of receivers, processors, exporters, connectors, extensions, and service pipelines.

CollectorKubernetesOpenTelemetry
0 likes · 11 min read
Quickly Set Up OpenTelemetry on Kubernetes: Installation, Modes & Config
Architect
Architect
Feb 28, 2024 · Cloud Native

Lightweight Kubernetes Log Collection with Loki: Deployment and Configuration Guide

This article provides a comprehensive, lightweight solution for collecting Kubernetes logs using Grafana Loki, covering its advantages, component comparison, deployment modes (All‑In‑One, microservices, bare‑metal), required configuration files, ConfigMap and PersistentVolume setup, Promtail installation, Helm deployment, and common troubleshooting steps.

KubernetesLokiPromtail
0 likes · 20 min read
Lightweight Kubernetes Log Collection with Loki: Deployment and Configuration Guide
vivo Internet Technology
vivo Internet Technology
Feb 28, 2024 · Cloud Native

vivo's Online-Offline Co-location Technology Practice: Data Center Resource Optimization

Vivo’s online‑offline co‑location platform consolidates latency‑sensitive online services and batch offline workloads on shared Kubernetes nodes, using differentiated resource views, priority‑based QoS, and safety watermarks to boost CPU utilization from 13 % to 25 %, adding 20 000 cores and 50 TB memory for peak‑hour offline tasks.

DevOpsKubernetesResource Isolation
0 likes · 17 min read
vivo's Online-Offline Co-location Technology Practice: Data Center Resource Optimization
Alibaba Cloud Native
Alibaba Cloud Native
Feb 28, 2024 · Cloud Native

Building a Unified Cloud‑Native Serverless Platform Across Public Cloud and IDC with ACK One & Knative

This guide explains how to design and implement a unified cloud‑native serverless platform that runs seamlessly on public clouds and on‑premise IDC clusters using Alibaba Cloud ACK One, Kubernetes, and Knative, covering architecture, key components, deployment steps, and best‑practice recommendations.

ACK OneKnativeKubernetes
0 likes · 11 min read
Building a Unified Cloud‑Native Serverless Platform Across Public Cloud and IDC with ACK One & Knative
Aikesheng Open Source Community
Aikesheng Open Source Community
Feb 27, 2024 · Cloud Native

Detailed Overview of LiteIO Architecture, Components, and Volume Lifecycle

This article provides a comprehensive technical overview of LiteIO, describing its core and CSI components, the complete volume lifecycle within Kubernetes, Disk‑Agent responsibilities, common implementation pitfalls, storage‑pool construction methods, and the design of the node‑disk‑controller, scheduler, and CSI modules.

CSICloud NativeDistributed Systems
0 likes · 13 min read
Detailed Overview of LiteIO Architecture, Components, and Volume Lifecycle
dbaplus Community
dbaplus Community
Feb 26, 2024 · Cloud Native

10 Hard‑Earned Lessons from 3 Years Managing Kubernetes Clusters

After three years of hands‑on Kubernetes administration, the author shares ten practical lessons covering cloud‑hosted clusters, infrastructure‑as‑code, Helm chart usage, service mesh decisions, tool selection, resource limits, stateless design, HPA configuration, and upgrade strategies to help both newcomers and seasoned engineers manage clusters effectively.

Cloud NativeCluster ManagementKubernetes
0 likes · 7 min read
10 Hard‑Earned Lessons from 3 Years Managing Kubernetes Clusters
AntData
AntData
Feb 22, 2024 · Cloud Native

Detailed Overview of LiteIO Architecture, Components, and Volume Lifecycle

This article provides a comprehensive technical overview of LiteIO, describing its core and CSI components, their interactions, the complete volume lifecycle within Kubernetes, common implementation pitfalls, and configuration examples for storage pools and agents.

CSICloud NativeKubernetes
0 likes · 14 min read
Detailed Overview of LiteIO Architecture, Components, and Volume Lifecycle
DevOps Cloud Academy
DevOps Cloud Academy
Feb 22, 2024 · Cloud Native

Blue‑Green Deployment with Host and Path‑Based Routing in Kubernetes

This guide explains how to implement a blue‑green deployment on Kubernetes with host‑ and path‑based routing, covering prerequisites, namespace creation, deployment manifests, service and ingress configuration, traffic switching, updates, verification, and rollback procedures.

Blue‑Green deploymentCloud NativeDevOps
0 likes · 8 min read
Blue‑Green Deployment with Host and Path‑Based Routing in Kubernetes
DataFunTalk
DataFunTalk
Feb 22, 2024 · Big Data

Flink on Kubernetes: Kuaishou’s Practice, Migration, and Future Refactoring

This article details Kuaishou’s five‑year evolution of Flink, covering its background, production refactoring to Kubernetes, migration practices, and future improvements, highlighting architecture layers, resource management, observability, and testing strategies for large‑scale stream processing.

Big DataCloud NativeFlink
0 likes · 12 min read
Flink on Kubernetes: Kuaishou’s Practice, Migration, and Future Refactoring
Liangxu Linux
Liangxu Linux
Feb 21, 2024 · Cloud Native

Why a Default Kubernetes Setting Can Spike CPU Usage and How to Fix It

A Node.js service migrated to containers began experiencing intermittent timeouts and high CPU usage due to the default enableServiceLinks parameter injecting thousands of environment variables, and the analysis shows how to identify, reproduce, and resolve the issue with Kubernetes configuration and code adjustments.

ContainerKubernetesNode.js
0 likes · 14 min read
Why a Default Kubernetes Setting Can Spike CPU Usage and How to Fix It
Alibaba Cloud Native
Alibaba Cloud Native
Feb 21, 2024 · Cloud Native

How Fluid & JindoCache Accelerate Large‑Scale AI Training in a Cloud‑Native Environment

This article examines the challenges of data‑intensive AI training on heterogeneous cloud‑native infrastructure and explains how the Fluid framework combined with JindoCache and KubeDL provides distributed caching, metadata acceleration, and seamless POSIX access to dramatically improve I/O performance, GPU utilization, and cost efficiency.

AI trainingData CachingFluid
0 likes · 18 min read
How Fluid & JindoCache Accelerate Large‑Scale AI Training in a Cloud‑Native Environment
JD Retail Technology
JD Retail Technology
Feb 20, 2024 · Operations

Measuring Operations Automation Rate and Building a Self‑Coding Automation Platform

This article explains the challenges of manual operations, defines an automation‑rate metric, introduces the Tai‑Shan Kirin platform for self‑coded operational automation, provides step‑by‑step implementation guidance with code examples, and shares a case study demonstrating significant efficiency and stability gains.

Automation MetricsCRDKubernetes
0 likes · 19 min read
Measuring Operations Automation Rate and Building a Self‑Coding Automation Platform
Liangxu Linux
Liangxu Linux
Feb 19, 2024 · Cloud Native

How CoreDNS and kubelet Configure /etc/resolv.conf in Kubernetes Pods

This article explains how CoreDNS runs on a Caddy‑based HTTP/2 server in Kubernetes, how kubelet injects the cluster DNS IP into each container’s /etc/resolv.conf, and how different dnsPolicy settings (Default, ClusterFirst, ClusterFirstWithHostNet, None) affect the resolv.conf configuration, including key options and examples.

CoreDNSKubernetesdnsPolicy
0 likes · 6 min read
How CoreDNS and kubelet Configure /etc/resolv.conf in Kubernetes Pods
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Feb 19, 2024 · Cloud Native

Unveiling Kubernetes: Architecture, Core Components, and Source Code Deep Dive

This article provides a comprehensive overview of Kubernetes, detailing its fundamental concepts, master‑worker architecture, networking model, security mechanisms, extensibility via custom resources, and an in‑depth examination of key source‑code modules such as kube‑apiserver, etcd, controller‑manager, scheduler, kubelet, and kube‑proxy, with links to the official repository.

Cloud NativeContainersKubernetes
0 likes · 10 min read
Unveiling Kubernetes: Architecture, Core Components, and Source Code Deep Dive
Alibaba Cloud Native
Alibaba Cloud Native
Feb 18, 2024 · Cloud Native

How to Build a Hybrid Cloud Disaster‑Recovery System with Alibaba ACK One

This guide explains how to use Alibaba Cloud's ACK One platform to connect on‑premises and public‑cloud Kubernetes clusters, configure network interconnectivity, create multi‑cluster fleets, optionally deploy applications via GitOps, and manage traffic with a multi‑cluster gateway for seamless same‑city disaster recovery.

ACK OneGitOpsKubernetes
0 likes · 13 min read
How to Build a Hybrid Cloud Disaster‑Recovery System with Alibaba ACK One
Top Architect
Top Architect
Feb 18, 2024 · Backend Development

Why Token Pass‑Through Is Discouraged and Alternative Unified Authorization Designs for Microservices

The article explains why passing tokens between microservices is a poor design, proposes exposing explicit userId parameters, describes unified authentication via an API gateway with Feign, Dubbo or Spring Boot Web implementations, compares their pros and cons, and shows how to integrate these patterns with Kubernetes and internal API path rules.

AuthenticationBackendDubbo
0 likes · 9 min read
Why Token Pass‑Through Is Discouraged and Alternative Unified Authorization Designs for Microservices
MaGe Linux Operations
MaGe Linux Operations
Feb 17, 2024 · Cloud Native

From chroot to Kubernetes: The Evolution of Containerization

Tracing the history of containerization, this article explores how early file isolation with chroot evolved through namespaces and cgroups, leading to LXC, Docker’s lightweight application packaging, Kubernetes orchestration, and finally cloud-native services like Huawei CCE, highlighting each stage’s impact on modern software deployment.

DockerKubernetesLinux
0 likes · 11 min read
From chroot to Kubernetes: The Evolution of Containerization
DevOps Cloud Academy
DevOps Cloud Academy
Feb 16, 2024 · Cloud Native

Configuring a Kubernetes Pod as a Jenkins Agent

This guide explains how to set up a Kubernetes pod to act as a Jenkins agent, covering prerequisites, deployment YAML, commands to launch and verify the pod and service, and the Jenkins UI configuration needed to connect the pod as a scalable CI/CD worker.

AgentCloud NativeJenkins
0 likes · 5 min read
Configuring a Kubernetes Pod as a Jenkins Agent
DevOps Cloud Academy
DevOps Cloud Academy
Feb 14, 2024 · Cloud Native

Weaveworks Shuts Down: What It Means for GitOps and the Future of Flux

Weaveworks, the company that coined the term GitOps, announced its closure due to unstable sales despite $10 million revenue, prompting industry analysis of GitOps’s commercial viability, competition with ArgoCD, and the future stewardship of the open‑source Flux project under CNCF.

Cloud NativeContinuous DeliveryFlux
0 likes · 5 min read
Weaveworks Shuts Down: What It Means for GitOps and the Future of Flux
MaGe Linux Operations
MaGe Linux Operations
Feb 10, 2024 · Backend Development

Mastering the Sidecar Pattern: Log Collection, Request Forwarding, and Interception in Kubernetes

This article explains the sidecar concept, compares it with SDK approaches, and provides detailed Kubernetes examples—including a log‑collection sidecar, a request‑forwarding sidecar, and an HTTP‑intercepting sidecar—complete with YAML manifests and Rust and Scala code to demonstrate implementation and deployment.

KubernetesRustScala
0 likes · 9 min read
Mastering the Sidecar Pattern: Log Collection, Request Forwarding, and Interception in Kubernetes
Architect
Architect
Feb 10, 2024 · Backend Development

Why Token Pass‑Through Is Bad and How to Build Unified Auth in Microservices

The article critiques token pass‑through for microservice authentication, explains why internal APIs should stay stateless, and presents unified authorization patterns using Spring Cloud Gateway with Feign, Dubbo, or a gateway‑less design, plus Kubernetes integration and trade‑offs.

AuthenticationDubboKubernetes
0 likes · 9 min read
Why Token Pass‑Through Is Bad and How to Build Unified Auth in Microservices
MaGe Linux Operations
MaGe Linux Operations
Feb 9, 2024 · Cloud Native

Mastering Kubernetes Ingress: Controllers, Architecture, and Lua Extensions

This article explains Kubernetes Ingress fundamentals, compares major Ingress controllers such as Nginx, Kong, Traefik, HAProxy and APISIX, and details the internal architecture and Lua‑based extension points of the ingress‑nginx controller, providing a comprehensive guide for managing external traffic in cloud‑native environments.

Cloud NativeControllersIngress
0 likes · 8 min read
Mastering Kubernetes Ingress: Controllers, Architecture, and Lua Extensions
37 Interactive Technology Team
37 Interactive Technology Team
Feb 8, 2024 · Operations

What Are Kubernetes Events and How to Collect Them

Kubernetes events record state changes such as pod scheduling, image pulling, and failures, which can be inspected via kubectl but are retained only an hour, so tools like kube-eventer or kubernetes-event-exporter collect them for long‑term analysis, enabling monitoring of Warning types, failure reasons, and visualization through Grafana dashboards.

Cloud-nativeEventsGrafana
0 likes · 9 min read
What Are Kubernetes Events and How to Collect Them
MaGe Linux Operations
MaGe Linux Operations
Feb 7, 2024 · Backend Development

How to Deploy a High‑Availability RabbitMQ Cluster on Kubernetes with NFS Storage

This guide walks through installing RabbitMQ, explaining its features and typical use cases, then details step‑by‑step deployment of a mirrored‑mode RabbitMQ cluster on Kubernetes using StatefulSets, NFS‑backed persistent storage, RBAC, and verification of cluster health and management operations.

Cluster DeploymentKubernetesMessage Queue
0 likes · 23 min read
How to Deploy a High‑Availability RabbitMQ Cluster on Kubernetes with NFS Storage
MaGe Linux Operations
MaGe Linux Operations
Feb 6, 2024 · Cloud Native

How to Build a Cilium Dual‑Stack IPv4/IPv6 Kubernetes Cluster with Kind

This guide explains the concepts of IPv4/IPv6 dual‑stack networking, outlines two dual‑stack implementation methods, and provides step‑by‑step instructions to set up a Cilium‑enabled Kubernetes cluster using Kind, configure dual‑stack settings, deploy a demo app, and analyze routing behavior for both IP families.

CNICiliumDual-Stack
0 likes · 9 min read
How to Build a Cilium Dual‑Stack IPv4/IPv6 Kubernetes Cluster with Kind
dbaplus Community
dbaplus Community
Feb 4, 2024 · Operations

How Ant Group Leverages SLO and AIOps for Fine‑Grained Operations

This article details Ant Group's practical implementation of Service Level Objectives (SLO) and AIOps to achieve fine‑grained operations, covering SLO fundamentals, health‑score architecture, GitOps‑based data pipelines, error‑budget alerting, AI‑driven anomaly detection, fault localization techniques, and real‑world case studies on dashboards, Kubernetes SLOs, and emergency response workflows.

Error BudgetFault LocalizationKubernetes
0 likes · 38 min read
How Ant Group Leverages SLO and AIOps for Fine‑Grained Operations
Alibaba Cloud Native
Alibaba Cloud Native
Feb 4, 2024 · Cloud Native

Build Dynamic Fan‑Out/Fan‑In DAG Workflows with Argo on ACK One

This guide explains how to use Argo Workflow on Alibaba Cloud ACK One to implement dynamic fan‑out/fan‑in DAGs, splitting large log files, running parallel map tasks, and aggregating results with a reduce step, including full YAML definitions and execution steps.

Argo WorkflowDynamic DAGFan-out Fan-in
0 likes · 10 min read
Build Dynamic Fan‑Out/Fan‑In DAG Workflows with Argo on ACK One
MaGe Linux Operations
MaGe Linux Operations
Feb 2, 2024 · Cloud Native

Connect Java Maven Apps to Kubernetes with kubeconfig & ServiceAccount

This guide demonstrates how to set up a Maven project with the Fabric8 Kubernetes Java client, configure minimal kubeconfig or ServiceAccount credentials, and use sample code to list namespaces, illustrating essential steps for connecting Java applications to a Kubernetes cluster with minimal configuration.

Fabric8KubernetesServiceAccount
0 likes · 6 min read
Connect Java Maven Apps to Kubernetes with kubeconfig & ServiceAccount
Ops Development Stories
Ops Development Stories
Feb 2, 2024 · Cloud Native

Essential kubectl Commands for Efficient Kubernetes Management

This guide compiles a comprehensive set of kubectl and Docker commands for retrieving logs, sorting pods, managing secrets, cleaning resources, debugging, port forwarding, and performing cluster maintenance tasks, helping administrators streamline Kubernetes operations and troubleshoot issues effectively.

CLICloud NativeCluster Management
0 likes · 15 min read
Essential kubectl Commands for Efficient Kubernetes Management
Cloud Native Technology Community
Cloud Native Technology Community
Feb 2, 2024 · Cloud Native

Achieving Sub‑2‑Hour RTO: A Cloud‑Native Disaster Recovery Blueprint for Enterprises

This article examines how a leading global industrial group leveraged a cloud‑native platform to design a disaster‑recovery solution that meets a sub‑2‑hour RTO and a 1‑minute RPO, detailing architecture, data‑layer strategies, middleware replication, application and access‑layer handling, and operational best practices.

ACPCloud NativeGitOps
0 likes · 17 min read
Achieving Sub‑2‑Hour RTO: A Cloud‑Native Disaster Recovery Blueprint for Enterprises
AntData
AntData
Feb 1, 2024 · Cloud Native

Deploying LiteIO Cloud‑Native Block Storage Service on Kubernetes

This guide explains how to set up the high‑performance, cloud‑native LiteIO block storage service on a Kubernetes cluster, covering prerequisite VM preparation, kernel upgrade, Docker and Kubernetes installation, CRI configuration, LiteIO component deployment for both LVM and SPDK engines, and verification of Pods and PVCs.

CSICloud Native StorageKubernetes
0 likes · 14 min read
Deploying LiteIO Cloud‑Native Block Storage Service on Kubernetes
System Architect Go
System Architect Go
Jan 31, 2024 · Cloud Native

My CKA Certification Experience and Rapid‑Study Guide

The author shares how they passed the Certified Kubernetes Administrator exam with a 95 score, explains the certification’s scope, provides practical preparation tips, environment requirements, key study resources, and a concise strategy to quickly master the 17 recurring exam questions.

CKAExam GuideKubernetes
0 likes · 5 min read
My CKA Certification Experience and Rapid‑Study Guide
政采云技术
政采云技术
Jan 30, 2024 · Cloud Native

Understanding the Core Workflow of Kubernetes Informer in client-go

This article explains the internal workflow of the Kubernetes informer package in client-go, covering its architecture, key components such as Reflector, DeltaFIFO, and Indexer, and provides a step‑by‑step code example that demonstrates how informers are created, registered, started, and used to handle watch events efficiently.

ControllerDeltaFIFOGo
0 likes · 19 min read
Understanding the Core Workflow of Kubernetes Informer in client-go
MaGe Linux Operations
MaGe Linux Operations
Jan 30, 2024 · Cloud Native

How to Auto‑Recover Lost s3fs Mounts in a Huawei OBS CSI Plugin

This article explains why a Huawei OBS CSI plugin loses its s3fs process after a restart, causing "Transport endpoint is not connected" errors, and provides a step‑by‑step solution using client‑go to rebuild the mount and trigger kubelet remount via a liveness probe.

CSICloudNativeKubernetes
0 likes · 7 min read
How to Auto‑Recover Lost s3fs Mounts in a Huawei OBS CSI Plugin
Beike Product & Technology
Beike Product & Technology
Jan 29, 2024 · Information Security

Kubernetes Security Risks and Hardening Recommendations

This article analyzes Kubernetes security threats from cloud, cluster, and container perspectives, enumerates high‑risk permissions, default privileged accounts, and insecure configurations, and provides concrete hardening steps such as least‑privilege RAM policies, etcd encryption, RBAC tightening, and workload isolation measures.

CloudNativeKubernetesPodSecurity
0 likes · 31 min read
Kubernetes Security Risks and Hardening Recommendations
Liangxu Linux
Liangxu Linux
Jan 28, 2024 · Cloud Native

Master Kubernetes Troubleshooting: 100 Essential kubectl Commands

This guide compiles 100 practical kubectl commands that help you diagnose cluster information, pods, services, deployments, networking, storage, security, autoscaling, and many other Kubernetes components, providing a handy reference for effective cluster troubleshooting.

ClusterKubernetescloud-native
0 likes · 19 min read
Master Kubernetes Troubleshooting: 100 Essential kubectl Commands
Liangxu Linux
Liangxu Linux
Jan 28, 2024 · Cloud Native

Debugging Running Pods in Kubernetes Without Root Access

This guide explains why kubectl exec often fails under security best practices, introduces kubectl debug with ephemeral containers for root‑level troubleshooting, shows how to create and use debug containers, and outlines alternative non‑native methods and tools for inspecting live pods.

EphemeralContainersKuberneteskubectl
0 likes · 10 min read
Debugging Running Pods in Kubernetes Without Root Access
DevOps Operations Practice
DevOps Operations Practice
Jan 28, 2024 · Cloud Native

Five Open‑Source Storage Projects for Kubernetes

This article introduces five major open‑source storage solutions—OpenEBS, Rook, GlusterFS, Ceph, and LongHorn—explaining how each simplifies persistent data management for Kubernetes workloads while offering features such as replication, self‑healing, and multi‑node scalability.

CephCloud NativeKubernetes
0 likes · 6 min read
Five Open‑Source Storage Projects for Kubernetes
AntTech
AntTech
Jan 25, 2024 · Cloud Native

LiteIO: Open‑Source High‑Performance Cloud‑Native Block Device Service

LiteIO is an open‑source, high‑performance, cloud‑native block device service that uses NVMe‑oF and SPDK to provide point‑to‑point storage pooling, enabling efficient FinOps, serverless scaling, hot upgrades, zero‑copy I/O, snapshots, and thin provisioning for databases and applications in Kubernetes.

Cloud NativeKubernetesLiteIO
0 likes · 11 min read
LiteIO: Open‑Source High‑Performance Cloud‑Native Block Device Service
Aikesheng Open Source Community
Aikesheng Open Source Community
Jan 25, 2024 · Cloud Native

Introducing LiteIO: Open‑Source High‑Performance Cloud‑Native Block Device Service

LiteIO is an open‑source, cloud‑native block device service that leverages NVMe‑oF and SPDK to provide high‑performance, scalable storage for Kubernetes‑based workloads, improving storage utilization and enabling FinOps‑driven cost efficiency across large‑scale production environments.

Cloud Native StorageFinOpsKubernetes
0 likes · 12 min read
Introducing LiteIO: Open‑Source High‑Performance Cloud‑Native Block Device Service
Open Source Linux
Open Source Linux
Jan 25, 2024 · Cloud Native

Top 200 Kubernetes Interview Questions and Answers for Mastery

This comprehensive guide presents 200 essential Kubernetes interview questions covering fundamentals, architecture, real‑world scenarios, and advanced topics, complete with concise answers, diagrams, and practical insights to help candidates ace container orchestration interviews.

Cloud NativeDevOpsKubernetes
0 likes · 21 min read
Top 200 Kubernetes Interview Questions and Answers for Mastery
360 Smart Cloud
360 Smart Cloud
Jan 24, 2024 · Cloud Native

Idle Compute Sharing in Dedicated Kubernetes Clusters Using Karmada

The article describes how a company implements an idle compute sharing feature for dedicated Kubernetes clusters, leveraging Karmada to allocate spare CPU and memory to offline workloads, thereby improving resource utilization, reducing costs, and outlining usage scenarios, configuration steps, technical architecture, and future plans.

Cloud NativeIdle Compute SharingKarmada
0 likes · 9 min read
Idle Compute Sharing in Dedicated Kubernetes Clusters Using Karmada
Volcano Engine Developer Services
Volcano Engine Developer Services
Jan 24, 2024 · Cloud Native

How ByteDance’s Gödel Scheduler Unifies Online and Offline Workloads at Massive Scale

The article details ByteDance’s Gödel Scheduler, a cloud‑native, distributed Kubernetes scheduler that unifies online and offline workloads, describing its architecture, enhanced features, performance gains, roadmap, and open‑source plans, including its multi‑instance design, optimistic concurrency, and rescheduling capabilities for improved throughput and scheduling quality.

KubernetesSchedulerperformance optimization
0 likes · 15 min read
How ByteDance’s Gödel Scheduler Unifies Online and Offline Workloads at Massive Scale
Linux Code Review Hub
Linux Code Review Hub
Jan 23, 2024 · Industry Insights

2024 eBPF and Networking Trends Forecast

The article forecasts rapid eBPF adoption across cloud‑native networking, mobile devices, and observability, highlights emerging eBPF marketplaces, discusses performance gains with NetKit and BIG TCP, predicts IPv6‑first Kubernetes clusters, AI‑assisted network troubleshooting, and the growing convergence of platform engineering and networking in 2024.

CiliumKubernetesNetworking
0 likes · 16 min read
2024 eBPF and Networking Trends Forecast
MaGe Linux Operations
MaGe Linux Operations
Jan 22, 2024 · Cloud Native

Deploy and Secure Nacos Config Center on Huawei CCE & Kubernetes

This guide explains how to use Nacos as a centralized configuration center in Spring Boot micro‑services, covering common pitfalls of static configs, best‑practice namespace/group/DataId design, dependency setup, YAML examples, security annotations, role‑based access, Dockerfile tweaks, CCE deployment, database schema, and Kubernetes manifests for test and production environments.

Configuration ManagementDevOpsHuawei CCE
0 likes · 19 min read
Deploy and Secure Nacos Config Center on Huawei CCE & Kubernetes
Alibaba Cloud Native
Alibaba Cloud Native
Jan 22, 2024 · Cloud Native

How Kube Queue Optimizes Batch Job Scheduling in Kubernetes

Batch jobs demand efficient resource use, but Kubernetes’ default scheduler struggles with large queues; Kube Queue, a cloud‑native AI suite component, introduces dedicated queues, flexible strategies, and quota management to automate scheduling, support multi‑tenant workloads, and improve cluster utilization.

AIJob SchedulingKubernetes
0 likes · 14 min read
How Kube Queue Optimizes Batch Job Scheduling in Kubernetes
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Jan 22, 2024 · Industry Insights

How Trustworthy Computing Power Measurement Can Transform Cloud‑Native Services

This article examines the urgent need for standardized, trustworthy computing power measurement, outlines narrow and broad measurement frameworks, and details a technical solution that integrates WASM virtual machines and blockchain with Kubernetes to achieve precise, tamper‑proof resource accounting for modern cloud‑native environments.

KubernetesWasmcloud-native
0 likes · 14 min read
How Trustworthy Computing Power Measurement Can Transform Cloud‑Native Services
Yum! Tech Team
Yum! Tech Team
Jan 19, 2024 · Cloud Native

Lossless Scaling Strategies for High‑Concurrency Microservices

This article examines lossless scaling techniques for high‑concurrency microservice architectures, detailing the challenges of expansion and contraction, early scaling approaches, and advanced optimizations such as delayed registration, readiness probes, eager‑load Ribbon, cache preloading, health‑check strategies, and asynchronous consumer handling to ensure high availability, performance, and cost efficiency.

Cloud NativeKuberneteslossless scaling
0 likes · 16 min read
Lossless Scaling Strategies for High‑Concurrency Microservices
JavaEdge
JavaEdge
Jan 18, 2024 · Databases

Master RedisInsight: Visualize, Manage, and Optimize Redis Seamlessly

This guide introduces RedisInsight, a powerful GUI for Redis, outlines its key features such as cluster support, visual data browsing, built‑in CLI, stream and log analysis, and provides step‑by‑step installation instructions for Linux, Kubernetes, and macOS, plus basic usage tips.

Database ManagementGUIKubernetes
0 likes · 10 min read
Master RedisInsight: Visualize, Manage, and Optimize Redis Seamlessly
Linux Code Review Hub
Linux Code Review Hub
Jan 18, 2024 · Cloud Native

How to Build Unified Observability for Apache APISIX with DeepFlow

This article walks through deploying Apache APISIX and DeepFlow in a Kubernetes cluster, configuring eBPF‑based AutoTracing and OpenTelemetry integration, enabling Prometheus metrics, accessing logs and continuous profiling, and visualizing unified observability data via Grafana dashboards.

APISIXDeepFlowKubernetes
0 likes · 16 min read
How to Build Unified Observability for Apache APISIX with DeepFlow
Java Backend Technology
Java Backend Technology
Jan 18, 2024 · Databases

Master RedisInsight: Install, Configure, and Use the Ultimate Redis GUI

This guide introduces RedisInsight, outlines its key features, provides step‑by‑step installation on a physical server and via Kubernetes, explains environment configuration and startup, and demonstrates basic usage for monitoring and managing Redis instances through its graphical interface.

Database ManagementGUIInstallation
0 likes · 7 min read
Master RedisInsight: Install, Configure, and Use the Ultimate Redis GUI
dbaplus Community
dbaplus Community
Jan 16, 2024 · Cloud Native

How to Achieve Zero‑Downtime Service Deployment with Spring Cloud and Kubernetes

This article examines why most incidents occur during application rollout, analyzes the Kubernetes pod lifecycle for both startup and shutdown, identifies common zero‑downtime challenges, and presents concrete strategies—including active notifications, adaptive waiting, delayed registration, and readiness probes—to ensure lossless service upgrades and rollbacks.

KubernetesSpring CloudZero Downtime
0 likes · 11 min read
How to Achieve Zero‑Downtime Service Deployment with Spring Cloud and Kubernetes
Alibaba Cloud Native
Alibaba Cloud Native
Jan 16, 2024 · Cloud Native

What’s New in Koordinator v1.4.0? A Deep Dive into Mixed‑Workload Scheduling and Resource Optimizations

Koordinator v1.4.0 introduces mixed K8s/YARN workloads, NUMA‑aware scheduling, CPU‑normalization, enhanced ElasticQuota with tree structures and non‑preemptible pods, cold‑memory reporting, QoS for non‑containerized applications, and a suite of bug‑fixes and performance improvements for enterprise Kubernetes clusters.

CPU normalizationElasticQuotaKoordinator
0 likes · 24 min read
What’s New in Koordinator v1.4.0? A Deep Dive into Mixed‑Workload Scheduling and Resource Optimizations
Open Source Linux
Open Source Linux
Jan 15, 2024 · Cloud Native

Automate Kubernetes Local Storage and Backup with Carina and Velero

This guide explains why local storage remains essential in the cloud‑native era, outlines a step‑by‑step plan to set up a Kubernetes cluster, deploy Carina for automated local disk management, configure a test Nginx workload, install MinIO and Velero for backup, and finally perform backup and restore operations to verify data integrity.

BackupCarinaCloud Native Storage
0 likes · 30 min read
Automate Kubernetes Local Storage and Backup with Carina and Velero
MaGe Linux Operations
MaGe Linux Operations
Jan 14, 2024 · Operations

Mastering DevOps Architecture: From CI/CD to Real-World Success Stories

This comprehensive guide delves into DevOps architecture, explaining core concepts like continuous integration, delivery, and deployment, showcasing essential tools such as Jenkins, Docker, Kubernetes, and GitLab CI, and illustrating best practices and real‑world case studies from Netflix and Etsy to help teams accelerate, automate, and improve software delivery.

CI/CDDevOpsKubernetes
0 likes · 20 min read
Mastering DevOps Architecture: From CI/CD to Real-World Success Stories
Alibaba Cloud Native
Alibaba Cloud Native
Jan 12, 2024 · Cloud Native

Unlock Second-Scale Elastic Scheduling with ACK Virtual Nodes

This article explains how to use Alibaba Cloud Container Service (ACK) virtual nodes and Elastic Container Instances (ECI) to achieve second‑scale elasticity, covering installation, ResourcePolicy configuration, zone‑aware scheduling, high‑availability setups, and performance results with concrete YAML examples.

ECIKubernetesResourcePolicy
0 likes · 12 min read
Unlock Second-Scale Elastic Scheduling with ACK Virtual Nodes
Beike Product & Technology
Beike Product & Technology
Jan 12, 2024 · Information Security

Understanding High‑Risk Kubernetes RBAC Permissions and a Graph‑Based Risk Identification System

This article examines how misconfigured Kubernetes RBAC permissions can lead to privilege escalation across clusters, presents a graph‑based model to represent users, roles, and authorities, and provides code examples and Cypher queries for detecting and visualizing high‑risk permission paths.

KubernetesRBACgraph
0 likes · 16 min read
Understanding High‑Risk Kubernetes RBAC Permissions and a Graph‑Based Risk Identification System