Tagged articles
4047 articles
Page 8 of 41
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 17, 2025 · Cloud Native

Multi‑Cluster Delivery with ACK One GitOps: A Case Study at Wondershare Technology

Wondershare Technology adopted Alibaba Cloud's ACK One GitOps platform to automate and unify the deployment of dozens of Kubernetes clusters across multiple regions, addressing manual deployment inefficiencies, traceability, rollback challenges, and multi‑tenant permission management while achieving a 50% increase in release efficiency.

Argo CDCloud NativeGitOps
0 likes · 7 min read
Multi‑Cluster Delivery with ACK One GitOps: A Case Study at Wondershare Technology
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Feb 17, 2025 · Cloud Native

Optimizing Offline Pod Scheduling with Koordinator and Yarn-Operator

To reduce resource contention and improve offline task reliability, this article examines the challenges of using Koordinator with Hadoop Yarn pods on Kubernetes, proposes real‑time resource reporting and task‑level eviction strategies, details community and custom solutions, and outlines future enhancements with Volcano.

Big DataCloud NativeKoordinator
0 likes · 9 min read
Optimizing Offline Pod Scheduling with Koordinator and Yarn-Operator
DataFunSummit
DataFunSummit
Feb 16, 2025 · Big Data

Bilibili Big Data Task Migration to Cloud‑Native Kubernetes Using Volcano Scheduler

This article shares Bilibili’s experience migrating its offline big‑data workloads to a cloud‑native Kubernetes environment using the Volcano scheduler, covering migration background, scheduler adaptation, hierarchical queue implementation, over‑commit framework (Amiyad), and future work to improve performance and resource utilization.

Cloud NativeKubernetesResource Overcommit
0 likes · 15 min read
Bilibili Big Data Task Migration to Cloud‑Native Kubernetes Using Volcano Scheduler
Infra Learning Club
Infra Learning Club
Feb 15, 2025 · Cloud Native

Advanced Guide: Real‑Time GPU Process Migration in Kubernetes with CRIU

This article explains how os‑criu provides transparent, OS‑level GPU checkpoint/restore, compares its performance with NVIDIA's cuda‑checkpoint, walks through building and installing the PhOS framework, demonstrates migration of a Llama2‑13b‑chat workload in Docker, and discusses current limitations and future Kubernetes integration plans.

CRIUCheckpointDocker
0 likes · 9 min read
Advanced Guide: Real‑Time GPU Process Migration in Kubernetes with CRIU
MaGe Linux Operations
MaGe Linux Operations
Feb 15, 2025 · Cloud Native

Essential Kubernetes Pod Commands and Practical Examples

This article presents a comprehensive list of common Kubernetes pod management commands, detailed case studies for network sharing and shared storage, and an in‑depth explanation of pod YAML fields, helping readers master pod operations and troubleshooting.

KubernetesYAMLkubectl
0 likes · 18 min read
Essential Kubernetes Pod Commands and Practical Examples
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 14, 2025 · Cloud Native

Blue‑Green Deployment with Kruise Rollouts: Concepts, Implementation, and Comparison

This article explains the blue‑green deployment strategy, introduces Kruise Rollouts’ blue‑green capabilities, provides a step‑by‑step Kubernetes example with YAML manifests and kubectl commands, compares it to Argo Rollouts and Flux Flagger, discusses resource considerations and serverless advantages, and concludes with best‑practice recommendations.

Blue‑Green deploymentCloud NativeDevOps
0 likes · 16 min read
Blue‑Green Deployment with Kruise Rollouts: Concepts, Implementation, and Comparison
ByteDance Cloud Native
ByteDance Cloud Native
Feb 13, 2025 · Cloud Computing

Deploy the Full‑Size DeepSeek‑R1 Model on Volcengine Cloud with Terraform and Kubernetes

This guide walks you through two practical solutions for deploying the massive DeepSeek‑R1 model on Volcengine Cloud—one using Terraform for a quick two‑node GPU setup and another leveraging cloud‑native multi‑node distributed inference with Kubernetes, covering resource sizing, environment preparation, model download, monitoring, autoscaling, and storage acceleration.

AIKubernetesModel Deployment
0 likes · 22 min read
Deploy the Full‑Size DeepSeek‑R1 Model on Volcengine Cloud with Terraform and Kubernetes
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 13, 2025 · Cloud Computing

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

This guide walks you through deploying the DeepSeek‑R1 large‑language‑model inference service on Alibaba Cloud ACK One registered clusters using ACS GPU compute, covering model preparation, OSS storage setup, PersistentVolume configuration, arena‑based service deployment, and verification steps with concrete commands and parameters.

ACK OneACS GPUDeepSeek
0 likes · 14 min read
Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 13, 2025 · Artificial Intelligence

Deploying DeepSeek‑R1 671B Distributed Inference Service on Alibaba Cloud ACK with vLLM and Dify

This article explains how to quickly deploy the full‑parameter DeepSeek‑R1 671B model in a multi‑node GPU‑enabled Kubernetes cluster on Alibaba Cloud ACK, covering prerequisites, model parallelism, vLLM‑Ray distributed deployment, service verification, and integration with Dify to build a private AI Q&A assistant.

DeepSeekDifyDistributed Deployment
0 likes · 12 min read
Deploying DeepSeek‑R1 671B Distributed Inference Service on Alibaba Cloud ACK with vLLM and Dify
Ops Development Stories
Ops Development Stories
Feb 13, 2025 · Cloud Native

KubeDoor: AI‑Driven Kubernetes Load‑Aware Scheduling & Capacity Management

KubeDoor is an open‑source platform built with Python and Vue that leverages Kubernetes admission control, AI recommendations, and expert experience to provide load‑aware scheduling, capacity governance, real‑time resource analytics, and automated scaling for microservices, featuring a web UI, Grafana dashboards, and extensible control mechanisms.

AI schedulingAdmission ControllerCloud Native
0 likes · 11 min read
KubeDoor: AI‑Driven Kubernetes Load‑Aware Scheduling & Capacity Management
Practical DevOps Architecture
Practical DevOps Architecture
Feb 11, 2025 · Operations

Kubernetes Operations and Cloud Native Architecture Training Course

This comprehensive training program for intermediate to advanced users covers Kubernetes high‑availability deployment, elastic scaling, Helm package management, Ceph distributed storage integration, microservice container migration, Jenkins‑based CI/CD pipelines, and Istio service‑mesh governance, providing hands‑on labs, detailed chapters, and practical resources for mastering modern cloud‑native operations.

CephCloud NativeDevOps
0 likes · 7 min read
Kubernetes Operations and Cloud Native Architecture Training Course
DeWu Technology
DeWu Technology
Feb 10, 2025 · Operations

White‑Screen Operations Platform for Multi‑Cloud Kubernetes Middleware Management

The White‑Screen Operations Platform unifies multi‑cloud Kubernetes cluster and middleware management—automating Kafka, Elasticsearch, node, PV, and YAML tasks through a visual UI, eliminating fragmented command‑line scripts, cutting operation times from hours to minutes, standardizing processes, providing auditability, and delivering significant cost savings while scaling for future Kubernetes resources.

KubernetesOperatorautomation
0 likes · 20 min read
White‑Screen Operations Platform for Multi‑Cloud Kubernetes Middleware Management
DataFunSummit
DataFunSummit
Feb 6, 2025 · Big Data

Migrating Big Data Workloads to Cloud‑Native Kubernetes: Challenges, Solutions, and Lessons from OPPO

This article describes how OPPO's big‑data team transitioned from traditional IDC and EMR environments to a cloud‑native Kubernetes architecture, detailing the motivations, design principles, elastic scaling challenges, custom solutions, and future directions for large‑scale data processing on the cloud.

Cloud NativeKuberneteselastic scaling
0 likes · 18 min read
Migrating Big Data Workloads to Cloud‑Native Kubernetes: Challenges, Solutions, and Lessons from OPPO
Infra Learning Club
Infra Learning Club
Feb 6, 2025 · Artificial Intelligence

Getting Started with Huawei Ascend AI Accelerators

This guide walks through the fundamentals of Huawei Ascend NPU hardware, the CANN software stack, driver and firmware installation, Kubernetes integration via Docker runtime and device plugin, and a complete ResNet‑50 inference demo on Ascend 310P.

AI inferenceCANNDocker Runtime
0 likes · 12 min read
Getting Started with Huawei Ascend AI Accelerators
macrozheng
macrozheng
Feb 5, 2025 · Operations

Master Java Application Diagnostics with the Open‑Source Meteor Console

This guide introduces the open‑source Meteor Console, a non‑intrusive Java application diagnostic tool built on Arthas, covering its architecture, installation steps, core features like class querying, method monitoring, thread management, and links to the full microservice project and video tutorials.

ArthasDockerKubernetes
0 likes · 4 min read
Master Java Application Diagnostics with the Open‑Source Meteor Console
dbaplus Community
dbaplus Community
Feb 2, 2025 · Cloud Native

From Virtualization to Containers: A Complete Journey Through Container Technology

This article provides a comprehensive overview of container technology, covering its definition, key characteristics, historical evolution from early virtualization to modern Docker and Kubernetes ecosystems, core Linux mechanisms such as cgroups and namespaces, runtime implementations, OCI standards, security enhancements, and container orchestration.

ContainersDockerKubernetes
0 likes · 21 min read
From Virtualization to Containers: A Complete Journey Through Container Technology
Test Development Learning Exchange
Test Development Learning Exchange
Jan 28, 2025 · Interview Experience

Essential Interview Q&A: Testing, DevOps, Cloud, Linux, and Management Insights

This comprehensive guide compiles expert answers to common interview questions covering software testing strategies, API automation, performance testing, Linux system administration, Docker and Kubernetes fundamentals, CI/CD pipelines, and effective team management practices, providing valuable insights for candidates and hiring managers alike.

DevOpsDockerKubernetes
0 likes · 49 min read
Essential Interview Q&A: Testing, DevOps, Cloud, Linux, and Management Insights
MaGe Linux Operations
MaGe Linux Operations
Jan 24, 2025 · Databases

Enable and Configure MariaDB Log Auditing with the Server_Audit Plugin

This guide walks through verifying the MariaDB server_audit plugin, installing it via configuration files or SQL, setting audit event variables, enabling logging, and restarting the MySQL service in a Kubernetes environment to achieve comprehensive query and connection auditing.

Database ConfigurationKubernetesLog Auditing
0 likes · 5 min read
Enable and Configure MariaDB Log Auditing with the Server_Audit Plugin
Infra Learning Club
Infra Learning Club
Jan 23, 2025 · Cloud Native

Getting Started with GPU Remote Invocation Using rCUDA

This article introduces GPU remote invocation, explains rCUDA's architecture, walks through installing the server and client, demonstrates running CUDA samples on a GPU‑less node, and shows how to deploy rCUDA on Kubernetes with example DaemonSet and Job manifests.

CUDADockerGPU remote invocation
0 likes · 7 min read
Getting Started with GPU Remote Invocation Using rCUDA
MaGe Linux Operations
MaGe Linux Operations
Jan 23, 2025 · Cloud Native

Mastering Calico: Complete Guide to Installing and Configuring Kubernetes CNI Networking

This comprehensive guide explains the fundamentals of CNI plugins, details Calico's architecture and network modes, walks through installation via manifest or Tigera Operator, covers configuration of CIDR blocks, IPIP/VXLAN encapsulation, network policies, and provides step‑by‑step instructions for full deployment and clean uninstallation in Kubernetes clusters.

CNICalicoCloudNative
0 likes · 26 min read
Mastering Calico: Complete Guide to Installing and Configuring Kubernetes CNI Networking
IT Architects Alliance
IT Architects Alliance
Jan 22, 2025 · Cloud Native

Kubernetes in the Cloud‑Native Era: Architecture, Core Components, and Practical Practices

This article introduces Kubernetes as the cornerstone of cloud‑native architecture, explains its control‑plane and node components, demonstrates practical tasks such as namespace isolation, custom scheduling, and persistent storage with code examples, and showcases real‑world success cases across industries.

Cloud NativeDevOpsKubernetes
0 likes · 12 min read
Kubernetes in the Cloud‑Native Era: Architecture, Core Components, and Practical Practices
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 21, 2025 · Cloud Native

OpenYurt v1.6 Release: Node-Level Traffic Multiplexing and Enhanced Edge Autonomy

The OpenYurt v1.6 release introduces node‑level traffic multiplexing that can cut cloud‑edge communication by about 50% and adds enhanced edge autonomy features such as configurable autonomy duration and a webhook to keep services stable during node failures, while also providing various community and product updates.

Cloud NativeEdge AutonomyEdge Computing
0 likes · 7 min read
OpenYurt v1.6 Release: Node-Level Traffic Multiplexing and Enhanced Edge Autonomy
IT Architects Alliance
IT Architects Alliance
Jan 19, 2025 · Cloud Native

Mastering Cloud‑Native CI/CD: Build, Deploy, and Scale Your Pipelines

This comprehensive guide explains cloud‑native architecture fundamentals, walks through CI/CD pipeline core components, provides step‑by‑step instructions for setting up Git, Jenkins, Docker, and Kubernetes, and demonstrates advanced Tekton pipelines, while discussing benefits, challenges, and future trends.

Cloud NativeDockerJenkins
0 likes · 20 min read
Mastering Cloud‑Native CI/CD: Build, Deploy, and Scale Your Pipelines
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 17, 2025 · Artificial Intelligence

Elastic Scaling of Large Language Model Inference on Alibaba Cloud ACK with Knative, ResourcePolicy, and Fluid

This article explains how to reduce inference cost and improve performance for large language models on Alibaba Cloud ACK by using Knative's request‑based autoscaling, custom ResourcePolicy priority scheduling, and Fluid data‑caching to achieve elastic scaling, resource pre‑emption, and faster model loading.

FluidInferenceKnative
0 likes · 22 min read
Elastic Scaling of Large Language Model Inference on Alibaba Cloud ACK with Knative, ResourcePolicy, and Fluid
FunTester
FunTester
Jan 15, 2025 · Operations

How to Combine Performance Testing and Chaos Engineering for Rock‑Solid Systems

Drawing lessons from the 2021 AWS outage, this article explains how integrating performance testing with fault‑injection (chaos engineering) in microservice and Kubernetes environments can identify bottlenecks, validate resilience, and build a continuous stability strategy that balances speed and reliability.

KubernetesMicroservicesOperations
0 likes · 13 min read
How to Combine Performance Testing and Chaos Engineering for Rock‑Solid Systems
Alibaba Cloud Native
Alibaba Cloud Native
Jan 14, 2025 · Cloud Native

Unlocking Kubernetes IO Insights with ACK’s New Storage Monitoring Dashboards

This article explains how Alibaba Cloud Container Service for Kubernetes (ACK) has upgraded its storage monitoring dashboards to provide detailed visibility into local, PVC, and cloud‑based volumes, enabling users to detect IO bottlenecks, track real‑time read/write performance, and improve overall container reliability.

ACKCloud NativeDashboard
0 likes · 8 min read
Unlocking Kubernetes IO Insights with ACK’s New Storage Monitoring Dashboards
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 14, 2025 · Cloud Native

Managing Distributed ECS Resources with ACK Edge and Kubernetes

This guide explains how to use Alibaba Cloud's ACK Edge to create a secure, high‑availability Kubernetes cluster that unifies management and scheduling of ECS instances across multiple VPCs, regions, and accounts, with detailed scenarios, advantages, step‑by‑step procedures, and sample YAML deployments.

ACK@EdgeDaemonSetDistributed Resources
0 likes · 8 min read
Managing Distributed ECS Resources with ACK Edge and Kubernetes
Alibaba Cloud Observability
Alibaba Cloud Observability
Jan 13, 2025 · Cloud Native

Alibaba Cloud’s Guide to Stable Large‑Scale Kubernetes After OpenAI Crash

After the OpenAI outage caused massive Kubernetes API overload, Alibaba Cloud’s Container Service and Observability teams detail how they reinforce large‑scale K8s clusters with high‑availability control‑plane design, optimized Prometheus probing, out‑of‑band monitoring, and best‑practice guidelines for capacity planning, safe releases, and rapid incident response.

Alibaba CloudKubernetesLarge-Scale Clusters
0 likes · 21 min read
Alibaba Cloud’s Guide to Stable Large‑Scale Kubernetes After OpenAI Crash
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 10, 2025 · Cloud Native

Service-Level Disaster Recovery with Alibaba Cloud Service Mesh (ASM) across Multi-Cluster and Multi-Region Deployments

This guide explains how to handle service‑level failures in Kubernetes by using Alibaba Cloud Service Mesh (ASM) to automatically detect faults, shift traffic based on geographic priority, and implement various multi‑cluster, multi‑region, and multi‑cloud topologies for high availability.

ASMKubernetesTraffic Shifting
0 likes · 31 min read
Service-Level Disaster Recovery with Alibaba Cloud Service Mesh (ASM) across Multi-Cluster and Multi-Region Deployments
Java Tech Enthusiast
Java Tech Enthusiast
Jan 9, 2025 · Cloud Native

Configuring NVIDIA Docker Plugin and GPU Access in Kubernetes

This guide walks through installing the NVIDIA container toolkit, configuring Docker to use the NVIDIA runtime, verifying GPU access, deploying the NVIDIA device plugin in Kubernetes, labeling GPU nodes, and running a GPU‑accelerated FFmpeg pod to confirm successful GPU integration.

Container ToolkitDockerGPU
0 likes · 12 min read
Configuring NVIDIA Docker Plugin and GPU Access in Kubernetes
macrozheng
macrozheng
Jan 9, 2025 · Cloud Native

How Windmill Turns Scripts into Interactive UIs and Automated Workflows

Windmill is an open‑source platform that converts scripts into interactive user interfaces and orchestrated workflows, offering automatic UI generation, multi‑language support, high‑performance Rust backend, and secure sandboxing, and can be self‑hosted via Docker Compose or Kubernetes for rapid internal tool development.

DockerKubernetesRust
0 likes · 6 min read
How Windmill Turns Scripts into Interactive UIs and Automated Workflows
Liangxu Linux
Liangxu Linux
Jan 8, 2025 · Cloud Native

Enable NVIDIA GPU Access in Docker and Kubernetes with the NVIDIA Container Toolkit

This guide walks through checking system and software environments, installing and configuring the NVIDIA Docker plugin, verifying GPU access in Docker containers, deploying the NVIDIA device plugin on a Kubernetes cluster, creating GPU‑enabled pods, and troubleshooting common issues, all with concrete commands and configuration examples.

Container ToolkitGPUKubernetes
0 likes · 12 min read
Enable NVIDIA GPU Access in Docker and Kubernetes with the NVIDIA Container Toolkit
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 8, 2025 · Cloud Native

Designing AZ‑Level Disaster Recovery with Alibaba Cloud ACK and Service Mesh ASM

This guide explains how to achieve zone‑level disaster recovery on Alibaba Cloud by deploying multi‑AZ ACK clusters, configuring Service Mesh ASM for observability and traffic shifting, and using Prometheus‑based metrics and alerts to detect and isolate failures, including step‑by‑step instructions and sample YAML manifests.

KubernetesMulti‑AZPrometheus
0 likes · 24 min read
Designing AZ‑Level Disaster Recovery with Alibaba Cloud ACK and Service Mesh ASM
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Jan 7, 2025 · Cloud Native

Build a Full Kubernetes DevOps Pipeline: From Containerization to Monitoring

This guide walks through a complete Kubernetes DevOps case study, detailing how to containerize micro‑services, create Docker images, write deployment and service manifests, set up a CI/CD pipeline with Jenkins or GitLab CI, integrate monitoring with Prometheus‑Grafana, manage logs via ELK/EFK, optionally add a service mesh, and perform fault‑injection testing for continuous optimization.

IstioKubernetesPrometheus
0 likes · 6 min read
Build a Full Kubernetes DevOps Pipeline: From Containerization to Monitoring
IT Architects Alliance
IT Architects Alliance
Jan 6, 2025 · Cloud Native

Mastering Service Discovery and Dynamic Scaling in Cloud‑Native Architectures

This article explains how distributed systems transition from monolithic to micro‑service architectures, detailing the role of registries, service registration methods, discovery mechanisms, and both horizontal and vertical scaling strategies, with practical examples and guidance for technology selection and future trends.

Cloud NativeDynamic ScalingKubernetes
0 likes · 21 min read
Mastering Service Discovery and Dynamic Scaling in Cloud‑Native Architectures
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 6, 2025 · Cloud Native

How Fluid Enables Seamless Dynamic Dataset Mounting for Cloud‑Native AI Development

PAI‑DSW leverages the Fluid project to provide a cloud‑native AI development platform where data scientists can dynamically mount and unmount OSS datasets on running Kubernetes pods without restarting, improving workflow efficiency and addressing the challenges of heterogeneous data source management in AI engineering.

AI DevelopmentCloud NativeFluid
0 likes · 18 min read
How Fluid Enables Seamless Dynamic Dataset Mounting for Cloud‑Native AI Development
Infra Learning Club
Infra Learning Club
Jan 4, 2025 · Cloud Native

How GPU Devices Are Dynamically Mounted to Kubernetes Pods

This article dissects the GPUMounter project's implementation of dynamic GPU device mounting to a pod, detailing the roles of cgroups (v1 and v2) and Linux namespaces, and provides step‑by‑step command‑line examples and a CLI tool for practical use.

GPUKubernetesNamespace
0 likes · 13 min read
How GPU Devices Are Dynamically Mounted to Kubernetes Pods
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 3, 2025 · Cloud Native

How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)

This guide explains how to use Alibaba Cloud Service Mesh (ASM) to add infrastructure‑level observability for large language model (LLM) traffic, covering custom access‑log fields, new Prometheus metrics for token usage, and adding model dimensions to native Istio metrics, with step‑by‑step commands and configuration examples.

ASMKubernetesLLM
0 likes · 14 min read
How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)
Bilibili Tech
Bilibili Tech
Jan 3, 2025 · Big Data

Evolution and Production Practices of Apache Celeborn Remote Shuffle Service at Bilibili

Bilibili replaced Spark’s unstable External Shuffle Service with a push‑based approach, then deployed Apache Celeborn’s remote shuffle on Kubernetes using HA masters, tiered workers, extensive monitoring, history‑based routing, chaos testing, and seamless Spark, Flink, and MapReduce integration, while planning self‑healing, elastic scaling, and priority‑aware I/O enhancements.

Apache CelebornBig DataFlink
0 likes · 28 min read
Evolution and Production Practices of Apache Celeborn Remote Shuffle Service at Bilibili
StarRocks
StarRocks
Jan 2, 2025 · Big Data

StarRocks Compute‑Storage Separation Cuts Costs 40% and Boosts Efficiency 20% at DMALL

DMALL upgraded its big‑data platform by adopting StarRocks 3.x with compute‑storage separation, lakehouse external tables, and Kubernetes deployment, achieving 20% higher compute utilization, 40% lower storage cost, faster cluster provisioning, and notable improvements in development and operations efficiency.

Big DataCompute-Storage SeparationKubernetes
0 likes · 25 min read
StarRocks Compute‑Storage Separation Cuts Costs 40% and Boosts Efficiency 20% at DMALL
Alibaba Cloud Native
Alibaba Cloud Native
Jan 2, 2025 · Cloud Native

Unlocking Serverless Elastic Scaling: ElasticWorkload, WorkloadSpread, UnitedDeployment & ResourcePolicy Explained

This article explains how Alibaba Cloud ACK’s four configurable plugins—ElasticWorkload, WorkloadSpread, UnitedDeployment, and ResourcePolicy—provide flexible, on‑demand resource scaling for serverless workloads, compares their architectures, outlines usage scenarios, shows real‑world examples, and discusses their strengths and limitations.

ACKKubernetesOpenKruise
0 likes · 33 min read
Unlocking Serverless Elastic Scaling: ElasticWorkload, WorkloadSpread, UnitedDeployment & ResourcePolicy Explained
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 1, 2025 · Industry Insights

How Cloud‑Native Is Reshaping China’s Game Industry and What Elastic Strategies Developers Need

The article analyzes the rapid growth of China's game cloud market, explains why cloud‑native adoption has become industry‑wide, and details practical application‑layer and resource‑layer elasticity strategies—including OpenKruiseGame, state‑aware scaling, and Alibaba Cloud node‑scaling options—to improve performance and reduce costs.

Alibaba CloudCloud NativeGame Development
0 likes · 14 min read
How Cloud‑Native Is Reshaping China’s Game Industry and What Elastic Strategies Developers Need
MaGe Linux Operations
MaGe Linux Operations
Dec 31, 2024 · Cloud Native

Step-by-Step Guide to Deploy Flannel CNI with Host‑GW, VXLAN, and iptables Optimization in Kubernetes

This tutorial walks through Kubernetes CNI networking, introduces common plugins, explains Flannel's three network models, details cluster planning, software download, installation, configuration, supervisor setup, service startup, pod‑to‑pod connectivity verification, iptables rule optimization, and DNS troubleshooting for a functional Flannel‑based cluster.

CNIClusterDocker
0 likes · 20 min read
Step-by-Step Guide to Deploy Flannel CNI with Host‑GW, VXLAN, and iptables Optimization in Kubernetes
Architect's Guide
Architect's Guide
Dec 31, 2024 · Backend Development

Apollo Configuration Center: Concepts, Architecture, and Spring Boot Integration Guide

This article provides a comprehensive tutorial on Apollo, covering its basic concepts, architecture, four-dimensional configuration model, client design, high‑availability considerations, and step‑by‑step instructions for creating a Spring Boot project, integrating Apollo dependencies, configuring environments, testing dynamic updates, and deploying the application on Kubernetes.

ApolloConfiguration ManagementKubernetes
0 likes · 22 min read
Apollo Configuration Center: Concepts, Architecture, and Spring Boot Integration Guide
dbaplus Community
dbaplus Community
Dec 30, 2024 · Cloud Native

What’s New in Kubernetes v1.32? A Deep Dive into 44 Feature Enhancements

Kubernetes v1.32 introduces 44 enhancements—including 13 stable, 12 beta, and 19 alpha features—spanning dynamic resource allocation, Windows node support, improved kubelet reliability, new API endpoints, and extensive updates to DRA, pod‑level resources, and scheduling, all aimed at strengthening the cloud‑native ecosystem.

Cloud NativeDRAFeature Enhancements
0 likes · 16 min read
What’s New in Kubernetes v1.32? A Deep Dive into 44 Feature Enhancements
Alibaba Cloud Observability
Alibaba Cloud Observability
Dec 30, 2024 · Cloud Native

What Caused OpenAI’s Global Outage? Lessons for Cloud‑Native Observability

The article analyzes the December 11 OpenAI outage, revealing that a newly deployed telemetry service overloaded Kubernetes API servers, breaking DNS resolution and slowing recovery, and compares OpenAI’s approach with LoongCollector/iLogtail’s design to offer stability insights for cloud‑native environments.

API ServerCloud NativeKubernetes
0 likes · 15 min read
What Caused OpenAI’s Global Outage? Lessons for Cloud‑Native Observability
MaGe Linux Operations
MaGe Linux Operations
Dec 30, 2024 · Cloud Native

Step-by-Step Guide to Deploy a Kubernetes Cluster on CentOS 7

This tutorial walks through preparing three CentOS 7 hosts, installing Docker, configuring kubeadm, initializing a Kubernetes master, troubleshooting common errors, adding worker nodes, installing a CNI plugin, testing the cluster with an Nginx deployment, and provides essential kubectl commands for ongoing management.

CNICentOSCluster Setup
0 likes · 22 min read
Step-by-Step Guide to Deploy a Kubernetes Cluster on CentOS 7
MaGe Linux Operations
MaGe Linux Operations
Dec 29, 2024 · Cloud Native

Step-by-Step Guide to Upgrading a Kubernetes Cluster to v1.15.12

This guide walks through downloading the latest Kubernetes packages, preparing master and node services, adjusting nginx proxy settings, safely cordoning and draining nodes, installing the new version, updating certificates and scripts, restarting services, and rebalancing pods to complete a seamless cluster upgrade to v1.15.12.

Cluster UpgradeKubernetesNode Maintenance
0 likes · 15 min read
Step-by-Step Guide to Upgrading a Kubernetes Cluster to v1.15.12
MaGe Linux Operations
MaGe Linux Operations
Dec 28, 2024 · Cloud Native

Mastering Three Core Methods to Manage Kubernetes Resources

This tutorial walks through the three fundamental approaches—imperative CLI commands, declarative manifest files, and GUI tools—for managing Kubernetes core resources such as namespaces, deployments, pods, and services, providing practical examples, code snippets, and best‑practice recommendations.

DeclarativeImperativeKubernetes
0 likes · 17 min read
Mastering Three Core Methods to Manage Kubernetes Resources
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 27, 2024 · Cloud Native

ElasticWorkload, WorkloadSpread, UnitedDeployment, and ResourcePolicy: Configurable Plugins for Serverless Elasticity in Alibaba Cloud Container Service

This article explains how Serverless elasticity is achieved in Alibaba Cloud Container Service by introducing four configurable plugins—ElasticWorkload, WorkloadSpread, UnitedDeployment, and ResourcePolicy—detailing their core capabilities, technical principles, advantages, real‑world use cases, and guidance for selecting the appropriate solution.

Cloud NativeElasticWorkloadKubernetes
0 likes · 30 min read
ElasticWorkload, WorkloadSpread, UnitedDeployment, and ResourcePolicy: Configurable Plugins for Serverless Elasticity in Alibaba Cloud Container Service
IT Architects Alliance
IT Architects Alliance
Dec 26, 2024 · Cloud Native

How Cloud‑Native, Microservices, Containers and DevOps Drive Digital Transformation

Cloud‑native architecture, built on microservices, containers, and DevOps, empowers enterprises with agility, scalability, and resilience, enabling rapid development, efficient resource utilization, and seamless continuous delivery, while addressing challenges like distributed transactions and service governance, and outlining future integration with 5G, edge computing, and AI.

5GCloud NativeContainers
0 likes · 15 min read
How Cloud‑Native, Microservices, Containers and DevOps Drive Digital Transformation
Architect
Architect
Dec 25, 2024 · Operations

Comprehensive Guide to Using Apollo Configuration Center with Spring Boot

This article provides a step‑by‑step tutorial on Apollo, an open‑source configuration center, covering its core concepts, dimensions, client design, Maven integration, Spring Boot setup, JVM parameters, testing scenarios, cluster/namespace usage, Docker image creation, and Kubernetes deployment for microservice applications.

ApolloConfiguration ManagementKubernetes
0 likes · 30 min read
Comprehensive Guide to Using Apollo Configuration Center with Spring Boot
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 25, 2024 · Cloud Native

Ensuring Stability of Large‑Scale Kubernetes Clusters: Lessons from the OpenAI Incident and Alibaba Cloud Practices

This article analyses the OpenAI large‑scale Kubernetes outage, explains the inherent risks of massive K8s clusters, and presents Alibaba Cloud's architectural enhancements, observability improvements, and best‑practice guidelines to achieve high‑availability and reliable operation of thousands‑node Kubernetes environments.

Cloud NativeKubernetesLarge-Scale Clusters
0 likes · 21 min read
Ensuring Stability of Large‑Scale Kubernetes Clusters: Lessons from the OpenAI Incident and Alibaba Cloud Practices
System Architect Go
System Architect Go
Dec 23, 2024 · Cloud Native

Mastering Kubernetes API Server Flow Control: APF Explained

This article explains how Kubernetes' API Priority and Fairness (APF) mechanism enhances kube‑apiserver traffic control by introducing FlowSchema and PriorityLevelConfiguration objects, allowing fine‑grained request prioritization, concurrency limits, and queue management beyond the basic inflight throttling flags.

APFAPI ServerCloud Native
0 likes · 7 min read
Mastering Kubernetes API Server Flow Control: APF Explained
Raymond Ops
Raymond Ops
Dec 22, 2024 · Cloud Native

Expose Istio Mesh Services with Nginx Ingress: A Step‑by‑Step Guide

This article explains the relationship between API gateways and service meshes, compares exposure methods, and provides a detailed, step‑by‑step guide for exposing services inside an Istio mesh using an Nginx Ingress Controller, including required annotations and configuration details.

IngressIstioKubernetes
0 likes · 8 min read
Expose Istio Mesh Services with Nginx Ingress: A Step‑by‑Step Guide
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 21, 2024 · Cloud Native

Understanding Docker and Kubernetes: Principles, Architecture, and Deployment Practices

This article explains the fundamentals of containerization by reviewing virtualization concepts, detailing Docker's architecture and Dockerfile syntax, and then introduces Kubernetes' control‑plane and node components, providing step‑by‑step examples for deploying a simple Nginx service and a Java web application on a K8s cluster, both manually and with automation tools.

Cloud NativeDeploymentDevOps
0 likes · 19 min read
Understanding Docker and Kubernetes: Principles, Architecture, and Deployment Practices
Xiaolei Talks DB
Xiaolei Talks DB
Dec 20, 2024 · Databases

Deploy a Production‑Ready TiDB 8.4 Cluster on Kubernetes with TiDB Operator

This guide walks you through the complete process of setting up a production‑grade TiDB 8.4.0 cluster on a self‑managed Kubernetes environment using the latest TiDB Operator, covering environment prerequisites, PV pool initialization, operator installation, CRD creation, and full cluster deployment with detailed commands and configurations.

Database DeploymentKubernetesOperator
0 likes · 15 min read
Deploy a Production‑Ready TiDB 8.4 Cluster on Kubernetes with TiDB Operator
Cloud Native Technology Community
Cloud Native Technology Community
Dec 20, 2024 · Cloud Native

Key Highlights of the Kubernetes 1.32 “Penelope” Release

The Kubernetes 1.32 "Penelope" release, launched on December 11, introduces major enhancements such as improved Dynamic Resource Allocation, pod‑level resource specifications, asynchronous preemption, CEL‑based mutating admission, tighter security controls, graceful Windows node shutdown, full consistency test coverage, and several performance and networking upgrades.

Asynchronous PreemptionDynamic Resource AllocationKubernetes
0 likes · 6 min read
Key Highlights of the Kubernetes 1.32 “Penelope” Release
Raymond Ops
Raymond Ops
Dec 19, 2024 · Operations

How to Auto‑Scale Non‑CPU Apps with cAdvisor Network Metrics in Kubernetes

This guide explains how to use cAdvisor‑provided container network traffic counters as custom metrics for Kubernetes HPA, covering metric collection, Prometheus‑adapter configuration, verification, and a complete HPA testing workflow for elastic scaling of non‑CPU‑intensive workloads.

HPAKubernetesPrometheus
0 likes · 7 min read
How to Auto‑Scale Non‑CPU Apps with cAdvisor Network Metrics in Kubernetes
System Architect Go
System Architect Go
Dec 19, 2024 · Operations

Why Did OpenAI’s New Telemetry Crash Their Kubernetes Cluster?

On December 11, 2024 OpenAI’s Kubernetes cluster suffered a four‑hour outage after a newly deployed telemetry service generated massive API traffic from every node, overwhelming the kube‑apiserver, breaking DNS‑based service discovery, and exposing gaps in control‑plane monitoring and break‑glass mechanisms, prompting critical questions about component behavior and configuration.

API overloadControl PlaneDNS
0 likes · 8 min read
Why Did OpenAI’s New Telemetry Crash Their Kubernetes Cluster?
Linux Ops Smart Journey
Linux Ops Smart Journey
Dec 18, 2024 · Cloud Native

How to Deploy MinIO Object Storage on Kubernetes for Production

This guide explains why MinIO is a high‑performance, S3‑compatible object storage solution and walks you through the prerequisites, Helm‑based installation steps on a Kubernetes cluster, verification procedures, and best‑practice tips for deploying a secure, scalable production‑grade MinIO service.

Kuberneteshelmobject storage
0 likes · 8 min read
How to Deploy MinIO Object Storage on Kubernetes for Production
Open Source Linux
Open Source Linux
Dec 18, 2024 · Cloud Native

What’s New in Kubernetes v1.32? 44 Enhancements Across Stable, Beta, and Alpha

Kubernetes v1.32 introduces 44 enhancements—including DRA improvements, node and sidecar usability upgrades, new stable APIs, beta features like managed Jobs, and alpha innovations such as asynchronous preemption—while celebrating a decade of cloud‑native progress with the Penelope theme and updated Windows support.

Kubernetescloud-nativefeatures
0 likes · 15 min read
What’s New in Kubernetes v1.32? 44 Enhancements Across Stable, Beta, and Alpha
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 17, 2024 · Cloud Native

Recap of Kubernetes Community Day 2024 Jakarta: Generative AI, eRDMA, Container Security, and Observability

The Kubernetes Community Day held in Jakarta on November 30, 2024 featured Alibaba Cloud experts presenting best‑practice sessions on scaling generative AI workloads, eRDMA network acceleration, container image security, and OpenTelemetry‑based observability within the ACK Kubernetes platform.

Cloud NativeContainer SecurityKubernetes
0 likes · 6 min read
Recap of Kubernetes Community Day 2024 Jakarta: Generative AI, eRDMA, Container Security, and Observability
macrozheng
macrozheng
Dec 17, 2024 · Cloud Native

Build a Lightweight Docker Registry with Registry & Docker‑Registry‑Browser

This guide walks through setting up a lightweight private Docker image registry using the official Docker registry and the visual docker‑registry‑browser tool, covering installation, configuration, image tagging, pushing, pulling, and running a sample SpringBoot‑Vue e‑commerce application in containers.

ContainerDevOpsDocker
0 likes · 8 min read
Build a Lightweight Docker Registry with Registry & Docker‑Registry‑Browser
DevOps Operations Practice
DevOps Operations Practice
Dec 16, 2024 · Cloud Native

Analysis of OpenAI's December 2024 Outage: Kubernetes Control Plane Overload and Mitigation

The December 11, 2024 OpenAI outage, caused by a misconfigured monitoring service that overloaded the Kubernetes control plane, led to a four‑hour service disruption and was resolved through cluster scaling, API blocking, and resource expansion, highlighting critical infrastructure risks for large‑scale cloud‑native operations.

Control PlaneKubernetesOpenAI
0 likes · 7 min read
Analysis of OpenAI's December 2024 Outage: Kubernetes Control Plane Overload and Mitigation
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Dec 16, 2024 · Backend Development

Master Spring Boot 3 Configuration: Multi‑File Profiles & Cloud Platform Activation

This article explains how to manage environment‑specific configuration in Spring Boot 3 using profile‑based YAML files, the spring.profiles.active property, the newer spring.config.activate.on‑profile feature, multi‑document files, and conditional activation for cloud platforms such as Kubernetes.

ConfigurationKubernetesMulti-Document
0 likes · 7 min read
Master Spring Boot 3 Configuration: Multi‑File Profiles & Cloud Platform Activation
IT Services Circle
IT Services Circle
Dec 15, 2024 · Databases

DBOS: A Database‑Oriented Operating System for Cloud Computing

The article explains how Databricks’ scaling challenges with PostgreSQL inspired the creation of DBOS, a database‑oriented operating system that places the OS beneath a distributed transactional database to simplify task scheduling, state management, and cloud‑native workloads.

DBOSKubernetescloud computing
0 likes · 9 min read
DBOS: A Database‑Oriented Operating System for Cloud Computing
System Architect Go
System Architect Go
Dec 11, 2024 · Cloud Native

Kubernetes CPU Configuration and Linux CFS Interaction

This article explains how Kubernetes resource requests and limits map to Linux cgroup settings via the CFS scheduler, illustrates the underlying calculations for cpu.shares and cpu.cfs_quota_us, and discusses the impact on programming languages such as Go and Java within containers.

CFSKubernetescgroup
0 likes · 5 min read
Kubernetes CPU Configuration and Linux CFS Interaction
Linux Ops Smart Journey
Linux Ops Smart Journey
Dec 10, 2024 · Cloud Native

Deploy GitLab on Kubernetes in One Click with Helm

This guide shows how to quickly set up a full GitLab instance on a Kubernetes cluster using Helm charts, covering prerequisite checks, configuration file creation, secret management, and verification steps to streamline CI/CD and team collaboration.

DevOpsKubernetesci/cd
0 likes · 8 min read
Deploy GitLab on Kubernetes in One Click with Helm
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 9, 2024 · Cloud Native

Building High‑Availability Architecture with Service Mesh (ASM) Across Availability Zones and Regions

This article explains how to design a highly available business system on Alibaba Cloud by leveraging multi‑availability‑zone deployments, ASM circuit‑breaking and rate‑limiting, and multi‑region multi‑cluster service‑mesh strategies to ensure resilience against both AZ‑level and region‑level failures.

ASMKubernetes
0 likes · 11 min read
Building High‑Availability Architecture with Service Mesh (ASM) Across Availability Zones and Regions