Tagged articles
4047 articles
Page 15 of 41
AntTech
AntTech
Nov 8, 2023 · Artificial Intelligence

Kapacity V0.2 Release: AI‑Driven Traffic‑Based Replica Prediction for Cloud‑Native Autoscaling

Kapacity V0.2 introduces an AI‑powered, traffic‑driven replica prediction algorithm for cloud‑native autoscaling, featuring a Linear‑Residual model, a lightweight Swish Net time‑series forecaster, custom metric support, and open‑source tools, aiming to improve resource efficiency and reduce operational risk.

AIKubernetesPredictive Autoscaling
0 likes · 9 min read
Kapacity V0.2 Release: AI‑Driven Traffic‑Based Replica Prediction for Cloud‑Native Autoscaling
Architecture Digest
Architecture Digest
Nov 8, 2023 · Backend Development

Comparing Spring Boot and Quarkus: Performance, Native Images, and Migration Guide

This article compares the Spring Boot and Quarkus Java frameworks by examining their architectures, running reactive test applications, measuring startup time, memory usage, CPU consumption and response latency for both JVM and native builds, and provides practical guidance for migrating Spring developers to Quarkus.

KubernetesNative ImagesPerformance Testing
0 likes · 16 min read
Comparing Spring Boot and Quarkus: Performance, Native Images, and Migration Guide
Open Source Linux
Open Source Linux
Nov 7, 2023 · Cloud Native

How to Deploy and Test Multus CNI for Multi‑Network Pods in Kubernetes

This guide explains the background, architecture, and step‑by‑step deployment of Multus CNI in a Kubernetes cluster, including configuring Calico and Flannel as primary and secondary networks, creating network attachment definitions, and testing pod connectivity across multiple interfaces.

CalicoFlannelKubernetes
0 likes · 21 min read
How to Deploy and Test Multus CNI for Multi‑Network Pods in Kubernetes
Alibaba Cloud Native
Alibaba Cloud Native
Nov 6, 2023 · Cloud Native

Mastering Loose‑Mode Traffic Swimlanes in Alibaba Cloud Service Mesh (ASM)

This guide walks you through configuring Alibaba Cloud Service Mesh (ASM) in loose‑mode traffic swimlane, covering prerequisites, sample service deployment, swimlane group and lane creation, automatic generation of DestinationRule and VirtualService resources, routing rule setup, and step‑by‑step verification of full‑link gray release.

ASMKubernetesLoose Mode
0 likes · 20 min read
Mastering Loose‑Mode Traffic Swimlanes in Alibaba Cloud Service Mesh (ASM)
MaGe Linux Operations
MaGe Linux Operations
Nov 5, 2023 · Cloud Native

How to Deploy and Test Multus CNI for Multi‑Network Pods in Kubernetes

This guide explains why Multus CNI is needed for multi‑network pods in Kubernetes, describes its architecture, walks through installing Multus alongside Calico and Flannel, shows how to configure NetworkAttachmentDefinitions, adjust Calico’s NIC selection, and demonstrates testing pod connectivity and routing limitations.

CalicoFlannelKubernetes
0 likes · 22 min read
How to Deploy and Test Multus CNI for Multi‑Network Pods in Kubernetes
DataFunTalk
DataFunTalk
Nov 5, 2023 · Cloud Native

Cloud‑Native Storage Acceleration: Experience and Practices with CloudFS on Volcano Engine

This article presents the cloud‑native storage acceleration demands, evaluates what constitutes a good acceleration solution, and details the design, implementation, and real‑world practice of CloudFS—including metadata acceleration, data‑plane caching, FUSE enhancements, AI training and multi‑cloud data‑lake use cases—while outlining future roadmap plans.

AICloudFSKubernetes
0 likes · 15 min read
Cloud‑Native Storage Acceleration: Experience and Practices with CloudFS on Volcano Engine

How Cloud‑Native Transforms Big Data Platforms: Challenges, Solutions, and Future Trends

This article analyzes the rise of cloud‑native technologies in big data ecosystems, identifies key pain points such as resource scheduling, service capabilities, performance, and operations, and presents detailed technical explorations—including Volcano batch scheduling, Kyuubi serverless, vectorized computing, remote shuffle services, and storage‑compute separation—while outlining future development directions.

KubernetesServerlesscloud-native
0 likes · 23 min read
How Cloud‑Native Transforms Big Data Platforms: Challenges, Solutions, and Future Trends
Tencent Music Tech Team
Tencent Music Tech Team
Oct 31, 2023 · Cloud Native

Advanced Istio Best Practices – Locality Routing and Service Mesh Optimization

The article by delphisfang offers a concise, step‑by‑step guide to mastering Istio’s locality‑aware routing, explaining the three‑evidence learning method, the priority algorithm, required DestinationRule and outlier detection settings, how Envoy discovers locality, and tips for simplifying the Pilot‑Envoy mesh architecture.

EnvoyIstioKubernetes
0 likes · 17 min read
Advanced Istio Best Practices – Locality Routing and Service Mesh Optimization
MaGe Linux Operations
MaGe Linux Operations
Oct 27, 2023 · Cloud Native

Deploy Grafana and Prometheus on Kubernetes in Minutes

This guide walks you through preparing a Kubernetes cluster, creating deployment manifests, configuring Grafana and Prometheus, and verifying the monitoring setup, including code snippets and step‑by‑step commands for a seamless installation on a lightweight cloud server.

Cloud NativeDevOpsGrafana
0 likes · 7 min read
Deploy Grafana and Prometheus on Kubernetes in Minutes
Ops Development Stories
Ops Development Stories
Oct 27, 2023 · Cloud Native

Collect Kubernetes Logs with OpenTelemetry and Loki Using Helm

This guide walks through deploying Loki via Helm, configuring the OpenTelemetry Collector to use a filelog receiver and Loki exporter, and enabling Kubernetes event collection, providing step‑by‑step commands and YAML snippets for a complete logging pipeline in a Kubernetes cluster.

CollectorKubernetesLoki
0 likes · 17 min read
Collect Kubernetes Logs with OpenTelemetry and Loki Using Helm
Cloud Native Technology Community
Cloud Native Technology Community
Oct 26, 2023 · Cloud Native

Understanding Kubernetes Validating Admission Policies with Practical Examples

This article explains Kubernetes Admission Controllers, distinguishes Mutating and Validating types, introduces the native Validating Admission Policies feature using CEL expressions, and provides a step‑by‑step demonstration with YAML manifests and kubectl commands to enforce replica limits on deployments.

Admission ControllersCELKubernetes
0 likes · 11 min read
Understanding Kubernetes Validating Admission Policies with Practical Examples
Sohu Tech Products
Sohu Tech Products
Oct 25, 2023 · Cloud Native

Strategies for Rolling Restart of Pods During Istio Service Mesh Upgrade

To upgrade an Istio service mesh without overloading the cluster or causing downtime, the author recommends using Kubernetes’s built‑in kubectl rollout restart for each deployment—scaling replicas up then deleting old pods or simply invoking the command in a scripted loop—to safely perform a rolling restart of all sidecar‑proxied pods.

DevOpsIstioKubernetes
0 likes · 8 min read
Strategies for Rolling Restart of Pods During Istio Service Mesh Upgrade
MaGe Linux Operations
MaGe Linux Operations
Oct 25, 2023 · Cloud Native

Deploy a Typecho Blog on Kubernetes: Step‑by‑Step Guide with MySQL

This tutorial walks you through preparing a Kubernetes cluster, deploying MySQL and Typecho containers with detailed YAML configurations, creating the necessary services and ingress, testing the setup, and highlighting Kubernetes' high‑availability and auto‑scaling benefits for a reliable blog platform.

Cloud NativeKubernetesTypecho
0 likes · 8 min read
Deploy a Typecho Blog on Kubernetes: Step‑by‑Step Guide with MySQL
Efficient Ops
Efficient Ops
Oct 24, 2023 · Operations

How to Monitor Business Metrics with Prometheus in Kubernetes

This article explains how to use Prometheus to monitor business‑level metrics in a Kubernetes environment, covering observability fundamentals, metric definitions, metric types, exposing metrics via a /metrics endpoint, and practical Go code examples for defining, recording, and scraping custom metrics.

GoKubernetesMetrics
0 likes · 11 min read
How to Monitor Business Metrics with Prometheus in Kubernetes
Alibaba Cloud Native
Alibaba Cloud Native
Oct 24, 2023 · Cloud Native

Boost Cluster Efficiency with Koordinator’s K8s‑YARN Co‑Location Solution

Koordinator extends its open‑source container scheduler to enable seamless co‑location of Kubernetes Pods and Hadoop YARN tasks, allowing over‑provisioned batch resources to be shared without modifying YARN, and has delivered up to 10 % CPU utilization gains and sub‑1 % eviction rates in Xiaohongshu’s production clusters.

Cluster SchedulingKubernetesResource Management
0 likes · 9 min read
Boost Cluster Efficiency with Koordinator’s K8s‑YARN Co‑Location Solution
Huolala Tech
Huolala Tech
Oct 23, 2023 · Information Security

How Huolala Secures Kubernetes: Real-World Container Security Practices

This article details Huolala's end‑to‑end container security strategy—from Kubernetes component basics and a real unauthorized‑access incident to lifecycle‑based safeguards, threat‑matrix guidance, image/ecosystem/baseline/runtime protections, and a custom HIDS architecture—offering practical insights for cloud‑native environments.

Cloud NativeContainer SecurityDevSecOps
0 likes · 14 min read
How Huolala Secures Kubernetes: Real-World Container Security Practices
Efficient Ops
Efficient Ops
Oct 22, 2023 · Operations

Master Loki: Deploy, Configure, and Query Logs Efficiently

This guide explains Loki's core concepts, deployment steps for Promtail and Loki, Grafana integration, label‑based indexing, handling dynamic and high‑cardinality tags, and query optimization techniques, providing a complete roadmap for building a cost‑effective, scalable log aggregation system.

GrafanaKubernetesLoki
0 likes · 15 min read
Master Loki: Deploy, Configure, and Query Logs Efficiently
MaGe Linux Operations
MaGe Linux Operations
Oct 22, 2023 · Cloud Native

Kubernetes Ingress vs OpenShift Route: Key Differences and How to Use Them

This article compares Kubernetes Ingress and OpenShift Route, outlining their similar functions for exposing services, detailing their architectures, configuration steps, and highlighting essential differences such as ecosystem integration, syntax, and security features, while providing practical examples and code snippets for implementation.

Cloud NativeKubernetesOpenShift
0 likes · 9 min read
Kubernetes Ingress vs OpenShift Route: Key Differences and How to Use Them
MaGe Linux Operations
MaGe Linux Operations
Oct 22, 2023 · Cloud Native

Automate Kubernetes Deployments: Step‑by‑Step Jenkins Pipeline Guide

Learn how to connect Jenkins Pipeline with Kubernetes to automate building, testing, and deploying containerized applications, covering prerequisite setup, detailed pipeline stages—including code checkout, Docker image creation, testing, registry push, and Kubernetes deployment—complete with code snippets and configuration tips.

CI/CDJenkinsKubernetes
0 likes · 4 min read
Automate Kubernetes Deployments: Step‑by‑Step Jenkins Pipeline Guide
Liangxu Linux
Liangxu Linux
Oct 22, 2023 · Databases

How Huge Linux Pages Can Boost Database Throughput on Kubernetes by Up to 8×

This article explains how Linux page size—from the default 4 KB to 2 MB or 1 GB huge pages—affects database performance, details the role of TLB cache hits and misses, presents benchmark results showing up to an eight‑fold throughput increase, and offers practical guidance for configuring huge pages on Kubernetes nodes.

Database PerformanceKubernetesTLB
0 likes · 14 min read
How Huge Linux Pages Can Boost Database Throughput on Kubernetes by Up to 8×
Alibaba Cloud Native
Alibaba Cloud Native
Oct 20, 2023 · Cloud Native

How Knative Cuts AI Service Costs by 60% and Halves Deployment Time

This article explains how Shuhe Tech combined Knative with AI workloads to achieve 60% resource cost savings and reduce model deployment cycles from one day to half a day, detailing Knative's architecture, request‑based autoscaling, multi‑version releases, and advanced scaling features.

AICloud NativeKPA
0 likes · 19 min read
How Knative Cuts AI Service Costs by 60% and Halves Deployment Time
Didi Tech
Didi Tech
Oct 19, 2023 · Cloud Native

Design and Implementation of a New Tiered Resource Guarantee System for Elastic Cloud Containers

The new tiered resource‑guarantee system for Didi’s elastic cloud containers defines S, A, and B priority levels with explicit over‑commit rules, upgrades OS, Kubernetes, kube‑odin, service‑tree, and CMP components, and thereby cuts CPU contention by up to 80%, reduces latency, improves scaling reliability, and lowers operational costs.

Container ManagementKubernetesOvercommit
0 likes · 16 min read
Design and Implementation of a New Tiered Resource Guarantee System for Elastic Cloud Containers
Efficient Ops
Efficient Ops
Oct 18, 2023 · Cloud Native

Why Does Containerd’s PLEG Relisting Stall at Node Startup and How to Fix It

When replacing dockershim with containerd, we observed that pods take over a minute to start because the GenericPLEG Relisting operation stalls for more than 30 seconds during node boot, caused by containerd’s UpdateContainerResources holding a bbolt lock and intensive image pulls; the article explains the root cause and provides a fix using the overlay volatile mount option.

KubernetesPLEGcontainer-runtime
0 likes · 16 min read
Why Does Containerd’s PLEG Relisting Stall at Node Startup and How to Fix It
MaGe Linux Operations
MaGe Linux Operations
Oct 17, 2023 · Databases

How Large Linux Pages Can Boost Database Throughput on Kubernetes by Up to 8×

This article explains how Linux page size, especially using 2 MB or 1 GB huge pages, dramatically improves database throughput on Kubernetes nodes—showing up to an eight‑fold increase for 4 KB pages—by reducing TLB misses and optimizing memory access, and provides practical guidance for configuring huge pages in various environments.

KubernetesLinuxdatabase
0 likes · 12 min read
How Large Linux Pages Can Boost Database Throughput on Kubernetes by Up to 8×
DevOps Cloud Academy
DevOps Cloud Academy
Oct 14, 2023 · Cloud Native

Introducing Kargo: A Multi‑Stage Application Orchestration Platform for CI/CD on Kubernetes

The article explains how Kargo, an open‑source, GitOps‑based platform built on Argo CD experience, addresses the complexities of multi‑stage CI/CD pipelines in Kubernetes by providing declarative stage definitions, promotion workflows, and advanced delivery features such as canary releases and A/B testing.

Argo CDContinuous DeliveryDevOps
0 likes · 12 min read
Introducing Kargo: A Multi‑Stage Application Orchestration Platform for CI/CD on Kubernetes
MaGe Linux Operations
MaGe Linux Operations
Oct 13, 2023 · Cloud Native

How Kubernetes Transforms Cloud‑Native Application Deployment and Management

This article explains what Kubernetes (K8s) is, its core features such as portability, scalability and automation, explores enterprise use cases, resource estimation, service migration, deployment evolution, cloud‑native concepts, and details the master‑node architecture and components that enable efficient container orchestration.

Cloud NativeDevOpsInfrastructure
0 likes · 9 min read
How Kubernetes Transforms Cloud‑Native Application Deployment and Management
DataFunSummit
DataFunSummit
Oct 13, 2023 · Big Data

Practical Experience of Flink on Kubernetes at Kuaishou

This article presents Kuaishou's comprehensive journey of adopting Flink on Kubernetes, covering its background, evolution, architecture, production migration, observability, testing, and future plans, and demonstrates how large‑scale streaming workloads are transformed to a cloud‑native environment.

Big DataFlinkKubernetes
0 likes · 14 min read
Practical Experience of Flink on Kubernetes at Kuaishou
Volcano Engine Developer Services
Volcano Engine Developer Services
Oct 12, 2023 · Cloud Native

How ByteDance’s Katalyst Memory Advisor Boosts Kubernetes Memory Efficiency

This article explains the challenges of memory management in mixed workloads, outlines the limitations of native Linux and Kubernetes mechanisms, and details how ByteDance’s open‑source Katalyst Memory Advisor improves memory utilization, QoS, and eviction handling through user‑space policies, multi‑dimensional interference detection, and adaptive mitigation actions.

KatalystKubernetesMemory Management
0 likes · 17 min read
How ByteDance’s Katalyst Memory Advisor Boosts Kubernetes Memory Efficiency
Ops Development Stories
Ops Development Stories
Oct 12, 2023 · Cloud Native

How to Monitor Kubernetes with OpenTelemetry Collector: Step‑by‑Step Helm Deployment

This guide walks through installing OpenTelemetry Collector on a Kubernetes cluster using Helm, configuring DaemonSet and Deployment collectors, integrating Prometheus for metrics, and customizing receivers, processors, and exporters to achieve comprehensive observability of nodes, pods, containers, and cluster resources.

KubernetesOpenTelemetryPrometheus
0 likes · 26 min read
How to Monitor Kubernetes with OpenTelemetry Collector: Step‑by‑Step Helm Deployment
ByteDance Cloud Native
ByteDance Cloud Native
Oct 11, 2023 · Cloud Native

How Katalyst Memory Advisor Optimizes Kubernetes Memory Management in Mixed Workloads

This article explains the challenges of memory management in mixed Kubernetes workloads, introduces ByteDance's open‑source Katalyst Memory Advisor, details native allocation and reclamation mechanisms, outlines its architecture and plugins, and describes interference detection and multi‑level mitigation strategies to improve memory utilization and service quality.

Cloud NativeKatalystKubernetes
0 likes · 19 min read
How Katalyst Memory Advisor Optimizes Kubernetes Memory Management in Mixed Workloads
DevOps Cloud Academy
DevOps Cloud Academy
Oct 11, 2023 · Cloud Native

A/B Testing with Argo Rollouts Experiments for Progressive Delivery

This article explains how to perform data‑driven A/B testing in progressive delivery using Argo Rollouts Experiments, covering the concepts of progressive delivery, A/B testing fundamentals, the Argo Rollouts architecture, required Kubernetes resources, and step‑by‑step commands and YAML manifests for a weather‑app example.

A/B testingArgo RolloutsKubernetes
0 likes · 19 min read
A/B Testing with Argo Rollouts Experiments for Progressive Delivery
DevOps
DevOps
Oct 10, 2023 · Operations

Common Kubernetes Pod Issues and Troubleshooting Guide

This article outlines typical Kubernetes pod failure states such as ContainerCreating, ErrImagePull, Pending, CrashLoopBackOff, and UnexpectedAdmissionError, explains their common causes—including Docker service problems, storage mount errors, ConfigMap misconfigurations, and image issues—and provides practical troubleshooting steps and example manifests.

ConfigMapDockerKubernetes
0 likes · 10 min read
Common Kubernetes Pod Issues and Troubleshooting Guide
Efficient Ops
Efficient Ops
Oct 9, 2023 · Cloud Native

Why Do Kubernetes Pods Get Stuck? Decoding Common Pod Status Errors

Learn how to diagnose and resolve frequent Kubernetes pod status issues such as ContainerCreating, ErrImagePull, Pending, CrashLoopBackOff, and UnexpectedAdmissionError by examining Docker services, storage mounts, ConfigMaps, image repositories, and node resources, with practical examples and command‑line solutions.

ConfigMapContainerCreatingErrImagePull
0 likes · 9 min read
Why Do Kubernetes Pods Get Stuck? Decoding Common Pod Status Errors
Ximalaya Technology Team
Ximalaya Technology Team
Oct 9, 2023 · Artificial Intelligence

DeepRec-Based High-Dimensional Sparse Feature Support and Real-Time Model Training in Ximalaya AI Cloud

Ximalaya AI Cloud leverages DeepRec’s Embedding Variable to elastically manage high‑dimensional sparse features with low collision, supporting admission/eviction, multi‑level storage and minute‑level incremental model updates, which together boost GPU utilization, halve training time and improve recommendation CTR by 2‑3 % while maintaining latency.

AI cloudDeepRecKubernetes
0 likes · 13 min read
DeepRec-Based High-Dimensional Sparse Feature Support and Real-Time Model Training in Ximalaya AI Cloud
Java Backend Technology
Java Backend Technology
Oct 8, 2023 · Operations

How I Traced a Sudden CPU Spike to JVM GC Issues in a Container

After receiving an alarm that a production container’s CPU usage surged past 90%, I investigated the JVM metrics, discovered excessive young and full GCs in a single pod, and walked through the detailed troubleshooting steps—including top, thread analysis, jstack, and code fixes—that resolved the issue.

CPU SpikeJVMKubernetes
0 likes · 7 min read
How I Traced a Sudden CPU Spike to JVM GC Issues in a Container
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Oct 7, 2023 · Cloud Native

How the US Air Force U‑2 Spy Plane Uses Jenkins & Kubernetes for Automated CI/CD

The US Air Force’s U‑2 reconnaissance aircraft has adopted a Jenkins‑driven CI/CD pipeline orchestrated by Kubernetes, enabling automated builds, repeatable deployments, rapid threat response, enhanced security, and resource savings, with a detailed step‑by‑step case study illustrating code management, pipeline configuration, container deployment, testing, and RBAC controls.

DevOpsJenkinsKubernetes
0 likes · 7 min read
How the US Air Force U‑2 Spy Plane Uses Jenkins & Kubernetes for Automated CI/CD
DevOps Cloud Academy
DevOps Cloud Academy
Oct 5, 2023 · Cloud Native

Balancing Kubernetes Workloads with the Descheduler and Related Tools

This article explains why Kubernetes does not automatically rebalance pods, demonstrates how to use the Descheduler, Node Problem Detector, and Cluster Autoscaler together to detect node pressure, evict overloaded pods, and scale down underutilized nodes for improved cluster efficiency.

Cluster AutoscalerDeschedulerKubernetes
0 likes · 7 min read
Balancing Kubernetes Workloads with the Descheduler and Related Tools
21CTO
21CTO
Oct 4, 2023 · Artificial Intelligence

How LangStream Merges Data Streams with Generative AI for Real‑Time LLM Apps

LangStream, the new open‑source framework from DataStax, combines event‑driven data streaming with generative AI, offering seamless integration with vector databases like Astra DB, Milvus, and Pinecone, and providing a Kubernetes‑based runtime that enables real‑time LLM applications without extensive coding.

Data StreamingKubernetesLLM
0 likes · 7 min read
How LangStream Merges Data Streams with Generative AI for Real‑Time LLM Apps
Tencent Cloud Developer
Tencent Cloud Developer
Sep 28, 2023 · Cloud Computing

Cloud Studio: Building Tencent Cloud's Cloud-Based Development Environment - A Self-Hosting Case Study

Tencent Cloud’s Cloud Studio team migrated its fragmented monorepo workflow to a self‑hosted, Kubernetes‑based cloud development environment, unifying code in a trunk‑based repository, delivering pre‑warmed IDE images, fast Git and startup performance, robust security, and laying groundwork for containerized debugging, vGPU AI training, and seamless cloud‑native development.

Cloud IDECloud StudioDevOps
0 likes · 19 min read
Cloud Studio: Building Tencent Cloud's Cloud-Based Development Environment - A Self-Hosting Case Study
37 Interactive Technology Team
37 Interactive Technology Team
Sep 25, 2023 · Cloud Native

Investigation of Kubernetes Container Isolation Mechanism and Its Impact

The article investigates a cloud‑vendor Kubernetes isolation feature that inserts iptables DROP rules into a pod’s network namespace, demonstrating how it fully blocks traffic, triggers liveness‑probe restarts, and impacts services depending on replica count and probe configuration, while preserving state only without probes.

Container SecurityIsolationKubernetes
0 likes · 7 min read
Investigation of Kubernetes Container Isolation Mechanism and Its Impact
Alibaba Cloud Native
Alibaba Cloud Native
Sep 24, 2023 · Cloud Computing

Designing Highly Available Cloud‑Native Applications on Alibaba Cloud ACK

This article explains how to build robust, highly available cloud‑native applications on Alibaba Cloud Container Service for Kubernetes (ACK) by covering architecture principles, multi‑zone cluster design, Kubernetes HA features such as topology spread constraints and pod anti‑affinity, storage strategies, load‑balancing, virtual nodes, health probes, monitoring, and multi‑cluster deployment patterns.

ACKCloud NativeKubernetes
0 likes · 35 min read
Designing Highly Available Cloud‑Native Applications on Alibaba Cloud ACK
Alibaba Cloud Native
Alibaba Cloud Native
Sep 21, 2023 · Cloud Native

How Alibaba Cloud’s SAE Achieves High Stability with Diagnostic Engines and Probes

This article explains how Alibaba Cloud's Serverless Application Engine (SAE) builds end‑to‑end stability by dividing fault handling into prevention, detection, localization and recovery, using a Kubernetes‑based diagnostic engine, runtime availability probes, a unified alert center, and a plug‑in architecture for root‑cause analysis.

Cloud NativeKubernetesServerless
0 likes · 28 min read
How Alibaba Cloud’s SAE Achieves High Stability with Diagnostic Engines and Probes
TAL Education Technology
TAL Education Technology
Sep 21, 2023 · Cloud Native

Kubernetes Development Practice: Code Compilation and Image Building

This guide walks through preparing the hardware and software environment, cloning the Kubernetes source, checking out a specific tag, compiling the code, building release images with appropriate build parameters, extracting and loading the images, and updating static manifests for custom Kubernetes deployments.

Cloud NativeDockerImage Build
0 likes · 8 min read
Kubernetes Development Practice: Code Compilation and Image Building
Didi Tech
Didi Tech
Sep 19, 2023 · Cloud Native

OrangeFS: A Cloud‑Native Multi‑Protocol Distributed Data Lake Storage System

OrangeFS is Didi’s cloud‑native, multi‑protocol distributed data‑lake storage system that unifies POSIX, S3 and HDFS access on a single logical hierarchy, integrates with Kubernetes via a CSI plugin, supports on‑premise and public‑cloud backends, provides multi‑tenant isolation, and dramatically improves elasticity, utilization and latency for petabyte‑scale workloads such as ride‑hailing logs, machine‑learning training, finance and analytics.

CSICloud Native StorageData Lake
0 likes · 17 min read
OrangeFS: A Cloud‑Native Multi‑Protocol Distributed Data Lake Storage System
Efficient Ops
Efficient Ops
Sep 17, 2023 · Cloud Native

Top 9 Essential Kubernetes Tools to Streamline Your Cloud‑Native Workflows

Explore nine indispensable Kubernetes tools—including Kubie, Kubespray, Helm, Minikube, K3s, Kustomize, KOps, Prometheus, and krew—that simplify cluster management, accelerate deployments, and enhance efficiency, helping you choose the right solution for smoother, more productive cloud‑native operations.

Cluster ManagementKubernetesPrometheus
0 likes · 6 min read
Top 9 Essential Kubernetes Tools to Streamline Your Cloud‑Native Workflows
MaGe Linux Operations
MaGe Linux Operations
Sep 16, 2023 · Cloud Native

How to Diagnose and Fix Pod Network Issues in Kubernetes Clusters

This article introduces a systematic approach for troubleshooting Kubernetes pod network anomalies, classifies common failure types, presents essential tools such as tcpdump, mtr, nsenter and paping, and walks through real‑world case studies to pinpoint and resolve connectivity problems.

CNIKubernetesmtr
0 likes · 26 min read
How to Diagnose and Fix Pod Network Issues in Kubernetes Clusters
Alibaba Cloud Native
Alibaba Cloud Native
Sep 16, 2023 · Cloud Native

Decoding Istio Ambient Mesh: Full Pod‑to‑Pod Traffic Path Explained

This article provides a step‑by‑step technical walkthrough of Istio Ambient Mesh traffic flow, detailing how a curl request from a sleep pod on Node‑A reaches an httpbin pod on Node‑B via iptables, policy routing, ztunnel and waypoint components, complete with code snippets and diagrams.

Ambient MeshIstioKubernetes
0 likes · 27 min read
Decoding Istio Ambient Mesh: Full Pod‑to‑Pod Traffic Path Explained
dbaplus Community
dbaplus Community
Sep 14, 2023 · Cloud Native

Mastering Kubernetes: 30+ Essential Pod, Node, and Cluster Troubleshooting Techniques

This guide compiles over thirty practical Kubernetes troubleshooting steps, covering pod startup failures, networking issues, resource bottlenecks, node abnormalities, cluster‑wide service problems, and detailed explanations of common container exit codes to help operators quickly diagnose and resolve issues.

Container exit codesKubernetesNode diagnostics
0 likes · 22 min read
Mastering Kubernetes: 30+ Essential Pod, Node, and Cluster Troubleshooting Techniques
Efficient Ops
Efficient Ops
Sep 11, 2023 · Cloud Native

Why Multi-Cluster Kubernetes Matters and How Vivo Tackles It

This article examines the motivations, benefits, and existing solutions for Kubernetes multi‑cluster management, then details Vivo's non‑federated and federated approaches, application‑centric continuous delivery, elastic scaling, unified scheduling, gray‑release strategies, and summarizes the current state and challenges.

DevOpsKarmadaKubernetes
0 likes · 22 min read
Why Multi-Cluster Kubernetes Matters and How Vivo Tackles It
Architect
Architect
Sep 11, 2023 · Databases

How eBay Scaled ClickHouse with Read/Write Separation and Keeper

This article details eBay's event monitoring platform architecture, explains the challenges of high‑load OLAP workloads on ClickHouse clusters, describes the design and implementation of read/write separation and multi‑shard Keeper coordination, and shares concrete configuration snippets, performance observations, and production lessons learned.

Distributed SystemsKeeperKubernetes
0 likes · 20 min read
How eBay Scaled ClickHouse with Read/Write Separation and Keeper
DataFunSummit
DataFunSummit
Sep 10, 2023 · Cloud Native

An Overview of Curve: High‑Performance Cloud‑Native Distributed Storage System

Curve is a high‑performance, easy‑to‑operate, cloud‑native open‑source distributed storage system (CNCF Sandbox) that provides block and file storage for OpenStack, Kubernetes, and PolarFS, featuring Raft‑based consistency, hybrid storage, high availability, and an ongoing roadmap for AI and other workloads.

Cloud NativeCurveKubernetes
0 likes · 16 min read
An Overview of Curve: High‑Performance Cloud‑Native Distributed Storage System
MaGe Linux Operations
MaGe Linux Operations
Sep 8, 2023 · Cloud Native

Master Real-Time Kubernetes Log Viewing with Kubetail and Stern

Learn how to efficiently monitor multiple Kubernetes pods by installing and using two lightweight, real‑time log aggregation tools—Kubetail and Stern—including installation steps for Homebrew, Linux, and Zsh, command‑line options, color output, and practical usage examples.

Cloud NativeKubernetesLog Monitoring
0 likes · 12 min read
Master Real-Time Kubernetes Log Viewing with Kubetail and Stern
Architect
Architect
Sep 7, 2023 · Cloud Native

How Vivo Scaled Container Monitoring with Prometheus, Kafka, and VictoriaMetrics

This article details how Vivo's container platform faced exploding metric volumes, component overload, data gaps, and storage spikes, and explains the step‑by‑step architectural redesign, metric governance, performance tuning, cAdvisor redeployment, and VictoriaMetrics upgrade that restored high‑availability, low‑latency monitoring across a large Kubernetes fleet.

Cloud NativeKubernetesPrometheus
0 likes · 18 min read
How Vivo Scaled Container Monitoring with Prometheus, Kafka, and VictoriaMetrics
Alibaba Cloud Native
Alibaba Cloud Native
Sep 7, 2023 · Cloud Native

Unlock Real‑Time Container Network Monitoring with KubeSkoop’s eBPF Probes

This article explains how KubeSkoop leverages eBPF to provide low‑overhead, pod‑level network monitoring and real‑time diagnostics for Kubernetes clusters, covering packet flow fundamentals, traditional troubleshooting tool limitations, the exporter’s probe architecture, daily monitoring practices, and future development plans.

GrafanaKubeSkoopKubernetes
0 likes · 22 min read
Unlock Real‑Time Container Network Monitoring with KubeSkoop’s eBPF Probes
Alibaba Cloud Native
Alibaba Cloud Native
Sep 7, 2023 · Cloud Native

Access On-Premises Data from Alibaba Cloud ECI with ACK Fluid & MinIO

This guide walks through using ACK Fluid to connect Alibaba Cloud Elastic Compute Instances (ECI) with on‑premises MinIO storage, covering prerequisites, deployment of MinIO, building a custom ThinRuntime image, creating Fluid profiles and datasets, and accessing data via a PVC‑mounted pod.

ACK FluidECIKubernetes
0 likes · 17 min read
Access On-Premises Data from Alibaba Cloud ECI with ACK Fluid & MinIO
37 Interactive Technology Team
37 Interactive Technology Team
Sep 7, 2023 · Cloud Native

Design and Implementation of the kjob Asynchronous Task Scheduling Platform on Kubernetes

The 37Game team built the cloud‑native kjob platform to replace VM‑based schedulers, providing a unified, highly available Kubernetes solution that manages both CronJob‑style scheduled tasks and long‑running Deployments through a backend‑agent architecture, offering CRUD operations, rich configuration, real‑time monitoring, alerting, and seamless migration.

Asynchronous JobsCloud-nativeGo
0 likes · 15 min read
Design and Implementation of the kjob Asynchronous Task Scheduling Platform on Kubernetes
Huolala Safety Emergency Response Center
Huolala Safety Emergency Response Center
Sep 7, 2023 · Information Security

How Huolala Secured Its Kubernetes Workloads: A Deep Dive into Container Security Practices

This article details Huolala's comprehensive container‑security program, covering Kubernetes component basics, a real‑world unauthorized‑access incident, a lifecycle‑based security framework, the Microsoft threat matrix, and the design of a home‑grown HIDS architecture to protect cloud‑native workloads.

Cloud NativeContainer SecurityDevSecOps
0 likes · 12 min read
How Huolala Secured Its Kubernetes Workloads: A Deep Dive into Container Security Practices
Cloud Native Technology Community
Cloud Native Technology Community
Sep 7, 2023 · Information Security

Kubernetes Security Testing: Importance, Methods, and Best Practices

This article explains why security testing is critical for Kubernetes clusters, outlines key testing approaches such as SAST, DAST, container image scanning, configuration audits, and network policy testing, and provides practical steps for integrating these methods into CI/CD pipelines to ensure robust cloud‑native security.

Configuration AuditContainer ScanningDAST
0 likes · 9 min read
Kubernetes Security Testing: Importance, Methods, and Best Practices
vivo Internet Technology
vivo Internet Technology
Sep 6, 2023 · Cloud Native

Multi-Cluster Management in Kubernetes: Concepts, Practices, and Karmada Exploration

The article explains why enterprises adopt multi‑cluster Kubernetes architectures, reviews community solutions such as Karmada, Clusternet and OCM, and details vivo’s hybrid strategy that combines a unified UI for independent clusters with Karmada‑based federation for resource distribution, elastic scaling, cross‑cluster scheduling, and gray‑release migration.

KarmadaKubernetesMulti-Cluster
0 likes · 20 min read
Multi-Cluster Management in Kubernetes: Concepts, Practices, and Karmada Exploration
DevOps Cloud Academy
DevOps Cloud Academy
Sep 6, 2023 · Cloud Native

The Evolving Role of Developers in Infrastructure as Code and Cloud‑Native Platforms

This article examines how infrastructure management has shifted toward treating infrastructure as code, the growing responsibilities of developers in deploying and maintaining cloud‑native platforms such as Kubernetes, the challenges they face, and the supporting role of platform/DevOps teams and tools like Terraform and ArgoCD.

ArgoCDCloud NativeDevOps
0 likes · 7 min read
The Evolving Role of Developers in Infrastructure as Code and Cloud‑Native Platforms
Cloud Native Technology Community
Cloud Native Technology Community
Sep 5, 2023 · Cloud Native

Why Kubernetes 1.28 Finally Makes Sidecars First‑Class Citizens

This guide explains what sidecars are in Kubernetes, why they matter, the challenges they introduce, and how the new sidecar KEP in Kubernetes 1.28—introducing an Always RestartPolicy for init containers—formalizes sidecars as first‑class API objects, reshaping service‑mesh implementations and operational practices.

Cloud NativeInit ContainerKubernetes
0 likes · 18 min read
Why Kubernetes 1.28 Finally Makes Sidecars First‑Class Citizens
Efficient Ops
Efficient Ops
Sep 4, 2023 · Cloud Native

Journey Through the Kubernetes Zoo: Learn Pods, Deployments, Ingress & More

A playful narrative follows Phippy and her niece Zee as they explore a Kubernetes‑themed zoo, turning whimsical animal scenes into clear explanations of Pods, ReplicaSets, Deployments, DaemonSets, Ingress, CronJobs and CustomResourceDefinitions for cloud‑native practitioners.

DeploymentsIngressKubernetes
0 likes · 8 min read
Journey Through the Kubernetes Zoo: Learn Pods, Deployments, Ingress & More
Alibaba Cloud Native
Alibaba Cloud Native
Sep 3, 2023 · Cloud Native

Master Knative’s Request‑Based Autoscaling: KPA, Scale‑to‑Zero, and Advanced Strategies

This article explains how Knative implements request‑based autoscaling with KPA, details the scale‑to‑zero mechanism, shows how to handle burst traffic using stable and panic windows, and demonstrates advanced extensions such as resource pools, precise MPA scaling, and predictive AHPA configurations with concrete YAML examples.

Cloud NativeKPAKnative
0 likes · 18 min read
Master Knative’s Request‑Based Autoscaling: KPA, Scale‑to‑Zero, and Advanced Strategies
DevOps Cloud Academy
DevOps Cloud Academy
Sep 1, 2023 · Cloud Native

Understanding Kubernetes Termination Signals and Graceful Shutdown

This article explains how Kubernetes termination signals work, the graceful shutdown workflow, handling of application termination, customization of grace periods, impact on high availability, best practices, and tools such as preStop hooks to ensure reliable container lifecycle management.

Grace PeriodKubernetesPod Lifecycle
0 likes · 9 min read
Understanding Kubernetes Termination Signals and Graceful Shutdown
MaGe Linux Operations
MaGe Linux Operations
Aug 31, 2023 · Cloud Native

How to Achieve Zero‑Downtime Deployments with Kubernetes

Learn how to configure Kubernetes for zero‑downtime applications by syncing container images, ensuring multiple pod replicas, using PodDisruptionBudgets, selecting appropriate deployment strategies, setting up liveness/readiness probes, handling graceful termination, applying pod anti‑affinity, and enabling autoscaling and proper resource limits.

KubernetesProbesZero Downtime
0 likes · 12 min read
How to Achieve Zero‑Downtime Deployments with Kubernetes
Liangxu Linux
Liangxu Linux
Aug 29, 2023 · Cloud Native

Master Real-Time Multi-Pod Log Viewing in Kubernetes with Kubetail & Stern

This guide introduces two lightweight Kubernetes log‑tailing tools, Kubetail and Stern, explains their installation on various platforms, demonstrates common usage patterns and command‑line options, and provides practical examples for aggregating and filtering logs across multiple pods and containers.

Cloud NativeDevOpsKubernetes
0 likes · 10 min read
Master Real-Time Multi-Pod Log Viewing in Kubernetes with Kubetail & Stern
DevOps Cloud Academy
DevOps Cloud Academy
Aug 29, 2023 · Cloud Native

Achieving Zero‑Downtime Applications with Kubernetes

This article explains why and how to use Kubernetes features such as multiple pod replicas, PodDisruptionBudgets, deployment strategies, health probes, graceful termination, anti‑affinity, resource limits, and autoscaling to build zero‑downtime, highly available applications.

Deployment StrategiesHealth probesKubernetes
0 likes · 12 min read
Achieving Zero‑Downtime Applications with Kubernetes
Alibaba Cloud Native
Alibaba Cloud Native
Aug 29, 2023 · Cloud Native

How Kruise Rollout Uses Lua Scripts for Extensible Gateway Traffic Scheduling

Kruise Rollout introduces a Lua‑script based, extensible traffic routing solution that enables progressive delivery across diverse gateway resources—such as Istio, Kong, and APISIX—by dynamically modifying VirtualService and DestinationRule objects, simplifying configuration, reducing custom code, and supporting automated canary, blue‑green, and A/B testing deployments.

Gateway APIIstioKruise Rollout
0 likes · 14 min read
How Kruise Rollout Uses Lua Scripts for Extensible Gateway Traffic Scheduling