Tagged articles
4047 articles
Page 18 of 41
Cloud Native Technology Community
Cloud Native Technology Community
Apr 20, 2023 · Cloud Native

Understanding Kubernetes kube‑scheduler Architecture, Workflow, and Plugin Development

This article explains the role of kube‑scheduler in Kubernetes, details its scheduling process, describes the plugin‑based framework with extension points such as PreEnqueue, Filter and Bind, and provides complete code examples and deployment instructions for building custom scheduler plugins.

KubernetesSchedulerScheduling Framework
0 likes · 33 min read
Understanding Kubernetes kube‑scheduler Architecture, Workflow, and Plugin Development
Selected Java Interview Questions
Selected Java Interview Questions
Apr 19, 2023 · Operations

Zero‑Downtime Deployment with Kubernetes and Spring Boot: Health Checks, Rolling Updates, Graceful Shutdown, Autoscaling, Prometheus Monitoring, and Config Separation

This guide explains how to achieve zero‑downtime releases of a Spring Boot application on Kubernetes by configuring readiness/liveness probes, rolling‑update strategies, graceful shutdown, horizontal pod autoscaling, Prometheus metrics collection, and externalized configuration via ConfigMaps.

ConfigMapKubernetesPrometheus
0 likes · 11 min read
Zero‑Downtime Deployment with Kubernetes and Spring Boot: Health Checks, Rolling Updates, Graceful Shutdown, Autoscaling, Prometheus Monitoring, and Config Separation
Open Source Linux
Open Source Linux
Apr 19, 2023 · Cloud Native

What’s New in Kubernetes v1.27? Key Features, Upgrades & Deprecations

Kubernetes v1.27, the first 2023 release, introduces 60 enhancements across Alpha, Beta and Stable stages, freezes the old k8s.gcr.io registry in favor of registry.k8s.io, promotes several security and scheduling features to stable, and removes numerous legacy flags and feature gates.

Cloud NativeDeprecationsKubernetes
0 likes · 11 min read
What’s New in Kubernetes v1.27? Key Features, Upgrades & Deprecations
Architects Research Society
Architects Research Society
Apr 18, 2023 · Cloud Native

Apache Camel: An Enterprise Integration Framework Growing in Importance and Expanding to Cloud‑Native Kubernetes Deployments

The article highlights Apache Camel’s rising relevance for enterprise integration, its extensive protocol support, deployment flexibility—including native Kubernetes options with Camel K and Camel Quarkus—while noting strong community activity and endorsement from European Commission developers.

Apache CamelKubernetesenterprise integration
0 likes · 7 min read
Apache Camel: An Enterprise Integration Framework Growing in Importance and Expanding to Cloud‑Native Kubernetes Deployments
Alibaba Cloud Native
Alibaba Cloud Native
Apr 18, 2023 · Artificial Intelligence

How to Deploy a CPU‑Based Stable Diffusion Service on Alibaba Cloud ACK

This guide walks you through the prerequisites, step‑by‑step console and kubectl procedures, YAML configuration, and post‑deployment verification needed to run a CPU‑only Stable Diffusion model on Alibaba Cloud Container Service (ACK) and optionally switch to a GPU‑enabled version.

ACKAI Model DeploymentCPU
0 likes · 7 min read
How to Deploy a CPU‑Based Stable Diffusion Service on Alibaba Cloud ACK
Bilibili Tech
Bilibili Tech
Apr 18, 2023 · Cloud Native

Kubernetes Audit Log Analysis for Container Security

The article explains how to enable Kubernetes audit logging and use its detailed fields—such as userAgent, responseStatus, requestURI, and object references—to detect CDK‑generated attacks and other threats like CVE‑2022‑3172, privilege escalation, and backdoor deployment, offering practical detection examples and security recommendations.

API ServerAudit loggingCDK
0 likes · 15 min read
Kubernetes Audit Log Analysis for Container Security
Efficient Ops
Efficient Ops
Apr 17, 2023 · Operations

Mastering Container Log Collection in Kubernetes: Strategies and Best Practices

This article explains how container log collection in Kubernetes differs from traditional host logging, outlines common deployment methods such as DaemonSet and Sidecar, compares log storage options, and offers practical guidance on handling stdout and file‑based logs for reliable operations.

DaemonSetKubernetesSidecar
0 likes · 12 min read
Mastering Container Log Collection in Kubernetes: Strategies and Best Practices
Alibaba Cloud Native
Alibaba Cloud Native
Apr 17, 2023 · Cloud Native

OpenKruise v1.4 Highlights: Sidecar Terminator and CloneSet Enhancements

The OpenKruise v1.4 release introduces the Job Sidecar Terminator for automatic sidecar shutdown, enables several stable capabilities by default, adds CloneSet performance and lifecycle improvements, provides a force‑recreate option for containers, and enhances image pre‑pull metadata handling, all while offering clear usage examples and configuration snippets.

CloneSetCloud NativeContainer
0 likes · 10 min read
OpenKruise v1.4 Highlights: Sidecar Terminator and CloneSet Enhancements
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 17, 2023 · Operations

How to Break Through Scale‑Out Ops Bottlenecks in the Cloud‑Native Era

This article analyzes the three main bottlenecks—stability, cost, and efficiency—encountered in large‑scale operations, presents a six‑stage pipeline and open‑source toolchain, and explains how cloud‑native technologies such as Kubernetes and AIOps can transform and automate massive infrastructure management.

KubernetesScalabilityaiops
0 likes · 18 min read
How to Break Through Scale‑Out Ops Bottlenecks in the Cloud‑Native Era
Efficient Ops
Efficient Ops
Apr 16, 2023 · Cloud Native

Mastering Kubernetes Probes: Liveness, Readiness, and Startup Explained

This article explains why Kubernetes health probes are essential, describes the three probe types (liveness, readiness, startup), their checking methods, configuration options, provides complete YAML examples, demonstrates testing scenarios, and outlines additional mechanisms that ensure container availability in a cloud‑native environment.

ContainerKubernetesProbes
0 likes · 14 min read
Mastering Kubernetes Probes: Liveness, Readiness, and Startup Explained
System Architect Go
System Architect Go
Apr 16, 2023 · Cloud Native

Understanding and Implementing Kubernetes Admission Controllers with a Sidecar Injection Example

This article explains the purpose and phases of Kubernetes Admission Controllers, outlines their security, governance, and configuration management benefits, and provides a step‑by‑step guide—including TLS certificate creation, a Go HTTPS webhook server, and MutatingWebhookConfiguration YAML—to inject a sidecar container into pods.

AdmissionControllerKubernetesSidecarInjection
0 likes · 11 min read
Understanding and Implementing Kubernetes Admission Controllers with a Sidecar Injection Example
Wukong Talks Architecture
Wukong Talks Architecture
Apr 16, 2023 · Cloud Native

Bosideng’s Cloud‑Native Transformation: Containerization, Microservices, and Full‑Link Traffic Governance

The article details Bosideng’s multi‑year digital transformation, describing how the apparel company migrated its legacy systems to cloud‑native architectures using Kubernetes, containerization, unified microservices, and Alibaba Cloud MSE to achieve zero‑loss deployment, traffic governance, and accelerated business innovation.

Cloud NativeContainerizationDigital Transformation
0 likes · 27 min read
Bosideng’s Cloud‑Native Transformation: Containerization, Microservices, and Full‑Link Traffic Governance
360 Quality & Efficiency
360 Quality & Efficiency
Apr 14, 2023 · Cloud Native

Ensuring Zero‑Downtime Rolling Updates in Kubernetes: Causes and Solutions

This article analyzes why Kubernetes rolling updates can still cause service interruptions during pod startup and termination, explains the underlying mechanisms of Kubelet and Endpoint Controller, and provides practical steps such as readiness probes and preStop hooks to achieve smoother, near‑zero‑downtime deployments.

KubernetesReadiness ProbeRolling Update
0 likes · 7 min read
Ensuring Zero‑Downtime Rolling Updates in Kubernetes: Causes and Solutions
Weimob Technology Center
Weimob Technology Center
Apr 13, 2023 · Backend Development

How Weimob Boosted API Performance with APISIX: A Deep Dive

This article details Weimob's migration to APISIX, covering background, performance requirements, benchmark results, architectural analysis, Kubernetes deployment, custom plugin extensions for authentication and rate limiting, remaining challenges, and overall conclusions about the gateway's impact.

APISIXKubernetesLua
0 likes · 14 min read
How Weimob Boosted API Performance with APISIX: A Deep Dive
Efficient Ops
Efficient Ops
Apr 12, 2023 · Operations

Building Highly Available Prometheus Monitoring with Thanos: A Practical Guide

This article explains why native Prometheus HA solutions fall short for large, multi‑region clusters and shows how to use Thanos components—including sidecar, query, store gateway, and compactor—to achieve long‑term storage, unlimited scaling, a global view, and non‑intrusive integration with existing Prometheus deployments.

KubernetesPrometheusThanos
0 likes · 22 min read
Building Highly Available Prometheus Monitoring with Thanos: A Practical Guide
Cloud Native Technology Community
Cloud Native Technology Community
Apr 12, 2023 · Cloud Native

Kubernetes v1.27 Release Highlights: New Features, Enhancements, and Deprecations

Kubernetes v1.27, the first 2023 release, introduces 60 enhancements—including image registry migration, SeccompDefault stabilization, Job mutable scheduling GA, DownwardAPIHugePages GA, and numerous beta-to-stable upgrades—while also deprecating several legacy features and providing links for full changelog and download.

KubernetesRelease Notesv1.27
0 likes · 12 min read
Kubernetes v1.27 Release Highlights: New Features, Enhancements, and Deprecations
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Apr 11, 2023 · Cloud Native

Master Kubernetes Basics: Deploy, Scale, and Update Apps with Simple Commands

This article introduces Kubernetes as an open‑source container orchestration platform, explains its core objects like Pods, Services, ReplicaSets, and Deployments, clarifies its relationship with Docker, and provides a step‑by‑step example covering deployment, exposure, scaling, rolling updates, and rollback using kubectl commands.

DeploymentDevOpsKubernetes
0 likes · 5 min read
Master Kubernetes Basics: Deploy, Scale, and Update Apps with Simple Commands
Alibaba Cloud Native
Alibaba Cloud Native
Apr 10, 2023 · Cloud Native

How CNStack Enables Full Lifecycle Management of Cloud Services and Components

This article provides a detailed overview of CNStack 2.0, explaining its cloud‑service and cloud‑component model, the cn‑app‑operator lifecycle controller, Sealer‑based build/share/run workflow, and the ability‑center white‑screen management that together simplify multi‑cluster cloud‑native application delivery.

CNStackKubernetesMulti-Cluster
0 likes · 12 min read
How CNStack Enables Full Lifecycle Management of Cloud Services and Components
New Oriental Technology
New Oriental Technology
Apr 7, 2023 · Cloud Native

Capo Project: Cloud‑Native Network Coordination Service – Deployment, Configuration, Testing, and CI/CD Guide

This article provides a comprehensive guide to the open‑source Capo cloud‑native network coordination service, covering its architecture, three deployment methods (Helm, Kustomize, plain YAML), detailed configuration parameters, observability setup, static code analysis with golangci‑lint, extensive unit and e2e testing using Kind, Helm chart packaging, registry publishing, and a full GitHub Actions CI/CD workflow.

Cloud NativeGoKubernetes
0 likes · 26 min read
Capo Project: Cloud‑Native Network Coordination Service – Deployment, Configuration, Testing, and CI/CD Guide
Bitu Technology
Bitu Technology
Apr 7, 2023 · Cloud Native

Managing Kubernetes Resource Manifests with Kustomize: Aggregation, Overlays, and Components

This article explains how Tubi’s engineering team uses Kustomize to simplify and scale Kubernetes Resource Manifest management by aggregating resources, applying patches, organizing bases and overlays, and leveraging reusable components to reduce duplication and improve maintainability across clusters and namespaces.

ComponentInfrastructureKubernetes
0 likes · 15 min read
Managing Kubernetes Resource Manifests with Kustomize: Aggregation, Overlays, and Components
Alibaba Cloud Native
Alibaba Cloud Native
Apr 6, 2023 · Cloud Native

How KubeVela Implements the Open Application Model for Cloud‑Native Platform Engineering

An overview of the CNCF’s platform engineering whitepaper highlights how KubeVela adopts the Open Application Model (OAM) to bridge developers and infrastructure, detailing components, traits, CUE templating, workflow, and management features, with practical examples and future directions in cloud‑native application delivery.

CUECloud NativeKubeVela
0 likes · 14 min read
How KubeVela Implements the Open Application Model for Cloud‑Native Platform Engineering
Efficient Ops
Efficient Ops
Apr 3, 2023 · Cloud Native

How to Secure Multi‑Tenant Kubernetes Clusters: Best Practices & Architecture

This article explains the concept of multi‑tenant Kubernetes clusters, outlines common enterprise scenarios, and details native security mechanisms such as RBAC, NetworkPolicy, PodSecurityPolicy, OPA, resource quotas, and dedicated nodes to achieve effective isolation and protect sensitive data.

Cloud NativeKubernetesNetworkPolicy
0 likes · 12 min read
How to Secure Multi‑Tenant Kubernetes Clusters: Best Practices & Architecture
System Architect Go
System Architect Go
Apr 3, 2023 · Cloud Native

Why Cilium Beats Flannel: Real‑World Kubernetes Networking Insights

The article analyzes how Cilium’s eBPF‑based architecture, advanced network policies, cluster‑wide traffic control, and observability tools like Hubble solved performance, security, and scalability challenges that Flannel and kube‑proxy could not meet in production Kubernetes environments.

CNICiliumCloud Native
0 likes · 12 min read
Why Cilium Beats Flannel: Real‑World Kubernetes Networking Insights
System Architect Go
System Architect Go
Mar 31, 2023 · Cloud Native

Understanding CPU Requests and Limits in Kubernetes

This article explains how Kubernetes uses CPU requests and limits to schedule pods, allocate CPU proportionally, calculate minimal request units, and provides practical guidelines for setting appropriate request and limit values based on workload characteristics and monitoring data.

KubernetesLimitsResource Management
0 likes · 6 min read
Understanding CPU Requests and Limits in Kubernetes
21CTO
21CTO
Mar 31, 2023 · Backend Development

Boost Go Performance: 6 Proven Techniques for Faster, Leaner Apps

This article presents six practical Go performance optimizations—including GOMAXPROCS tuning for Kubernetes, struct field ordering, garbage‑collection limits, zero‑copy unsafe conversions, jsoniter usage, and sync.Pool pooling—that together can dramatically lower CPU, memory, and latency in production services.

Garbage CollectionGoKubernetes
0 likes · 9 min read
Boost Go Performance: 6 Proven Techniques for Faster, Leaner Apps
DataFunSummit
DataFunSummit
Mar 30, 2023 · Artificial Intelligence

An Overview of ChatGPT’s Software Architecture and Technology Stack

The article examines ChatGPT’s underlying software architecture, detailing its cloud deployment on AWS and Azure, database choices like PostgreSQL and Redis, front‑end technologies such as TypeScript and React, core AI frameworks including PyTorch and Triton, as well as its container orchestration, monitoring, and programming language ecosystem.

AI ArchitectureChatGPTKubernetes
0 likes · 6 min read
An Overview of ChatGPT’s Software Architecture and Technology Stack
Cloud Native Technology Community
Cloud Native Technology Community
Mar 30, 2023 · Cloud Native

Kubernetes List/Watch, Informer Mechanism, and Writing Controllers for Pods and Custom Resources

This article explains how Kubernetes uses the List/Watch API and the Informer client library to monitor resources, compares direct HTTP Watch with Informer, provides Go code examples for pod controllers, shared informers, custom CRD controllers, and introduces higher‑level frameworks such as controller‑runtime and Kubebuilder.

CloudNativeControllerCustomResource
0 likes · 49 min read
Kubernetes List/Watch, Informer Mechanism, and Writing Controllers for Pods and Custom Resources
Cloud Native Technology Community
Cloud Native Technology Community
Mar 29, 2023 · Cloud Native

Kubernetes v1.27 Deprecations, API Removals, and Feature Gate Changes

Version 1.27 of Kubernetes introduces numerous deprecations and removals, including the migration of k8s.gcr.io to registry.k8s.io, the elimination of several API versions and feature gates such as CSIStorageCapacity, seccomp annotations, and various volume expansion options, with guidance for maintainers on required updates.

API RemovalFeature GatesKubernetes
0 likes · 12 min read
Kubernetes v1.27 Deprecations, API Removals, and Feature Gate Changes
Cloud Native Technology Community
Cloud Native Technology Community
Mar 28, 2023 · Cloud Native

How to Set Up Multi‑Cluster Networking with Kube‑OVN OVN‑IC

This guide explains how to enable cross‑cluster pod communication in Kubernetes using Kube‑OVN's OVN‑IC feature, covering prerequisites, single‑node and high‑availability database deployment, automatic and manual route configuration, and cleanup procedures with concrete Docker/Containerd commands and ConfigMap examples.

Cloud NativeKube-OVNKubernetes
0 likes · 15 min read
How to Set Up Multi‑Cluster Networking with Kube‑OVN OVN‑IC
System Architect Go
System Architect Go
Mar 27, 2023 · Cloud Native

Understanding Kubernetes Endpoint Propagation and Graceful Pod Deletion

Deleting a pod triggers endpoint removal, but various components like kube-proxy, CoreDNS, and ingress controllers may still route traffic until the endpoint fully propagates, so you must wait or use preStop hooks to delay deletion and handle SIGTERM gracefully within the configurable shutdown period.

Endpoint PropagationGraceful ShutdownKubernetes
0 likes · 5 min read
Understanding Kubernetes Endpoint Propagation and Graceful Pod Deletion
System Architect Go
System Architect Go
Mar 23, 2023 · Cloud Native

Directly Accessing the Kubernetes API with curl and Custom Code

This article explains how to bypass kubectl and interact directly with the Kubernetes API using curl or any programming language, covering API discovery, request construction, resource listing, watching, and modifying objects, while illustrating concepts with JavaScript examples and shared informers.

APIKubernetescURL
0 likes · 4 min read
Directly Accessing the Kubernetes API with curl and Custom Code
ITPUB
ITPUB
Mar 23, 2023 · Cloud Native

Scaling Zhongtong Cloud: From Single‑Cluster to Multi‑Cluster Governance

Drawing from Yang Xiaofei’s SACC2022 talk, this article details Zhongtong Cloud’s two‑year journey from initial containerization to a multi‑cluster architecture, covering challenges, custom scheduler extensions, fixed‑IP handling, container crash‑site preservation, node rebalancing, application migration, cross‑cluster load balancing, and future plans for unified gateways.

Cloud NativeContainerizationKubernetes
0 likes · 13 min read
Scaling Zhongtong Cloud: From Single‑Cluster to Multi‑Cluster Governance
Huolala Tech
Huolala Tech
Mar 23, 2023 · Cloud Native

How Huolala Built a Cloud‑Native One‑Stop AI Platform on Kubernetes

Huolala’s Big Data Intelligent Platform team describes how they built a cloud‑native, one‑stop AI solution on Kubernetes, integrating Flink‑based feature engineering, a multi‑tenant Zeppelin notebook, GPU‑aware training, and a unified model‑serving platform, while addressing resource isolation, storage persistence, and cross‑cloud deployment.

AI PlatformCloud NativeGPU scheduling
0 likes · 17 min read
How Huolala Built a Cloud‑Native One‑Stop AI Platform on Kubernetes
System Architect Go
System Architect Go
Mar 22, 2023 · Information Security

Understanding Anonymous Access in Kubernetes API Server and How to Disable It

The article explains how Kubernetes clusters can permit anonymous API access via the --anonymous-auth flag, describes the authentication‑authorization‑admission flow, shows common RBAC bindings that enable this access, discusses its prevalence, and provides practical steps to disable anonymous access in both self‑managed and managed clusters.

Anonymous AccessKubernetesRBAC
0 likes · 7 min read
Understanding Anonymous Access in Kubernetes API Server and How to Disable It
Efficient Ops
Efficient Ops
Mar 21, 2023 · Operations

How Hupu Scaled to Millions: Inside the Flex Auto‑Scaling Platform

This article details Hupu's massive sports‑traffic environment, the design and implementation of the Flex auto‑scaling platform, its architecture, core functions such as resource statistics, node and pod scaling, scenario scheduling, and the performance optimizations that enable rapid, cost‑effective scaling across multi‑cloud Kubernetes clusters.

Auto ScalingKubernetesResource Management
0 likes · 15 min read
How Hupu Scaled to Millions: Inside the Flex Auto‑Scaling Platform
Alibaba Cloud Native
Alibaba Cloud Native
Mar 21, 2023 · Cloud Native

How OpenYurt Enables Edge Autonomy on Unstable Networks

This article explains how OpenYurt extends Kubernetes to handle edge scenarios with unreliable or disconnected networks by introducing YurtHub caching, a centralized heartbeat proxy, and node‑binding mechanisms that keep workloads running and avoid unwanted pod eviction.

Cloud NativeEdge ComputingKubernetes
0 likes · 10 min read
How OpenYurt Enables Edge Autonomy on Unstable Networks
System Architect Go
System Architect Go
Mar 21, 2023 · Cloud Native

Understanding and Using Kubernetes Volume Snapshots

This article explains the concepts, architecture, configuration, and practical use cases of Kubernetes volume snapshots, including how to define snapshot classes, create snapshots, clone PVCs, and perform consistent backups across different storage providers and clusters.

CSICloudNativeKubernetes
0 likes · 11 min read
Understanding and Using Kubernetes Volume Snapshots
System Architect Go
System Architect Go
Mar 20, 2023 · Cloud Native

Secure Kubernetes Secrets: Comparing Sealed Secrets, External Secrets Operator, and CSI Driver

This article explains why native Kubernetes Secrets are insufficiently protected, introduces three open‑source solutions—Sealed Secrets, External Secrets Operator, and Secrets Store CSI Driver—covers their architecture, installation steps, usage examples, advantages, drawbacks, and provides practical code snippets for managing secrets safely in Git‑backed clusters.

CSI DriverCloud NativeExternal Secrets Operator
0 likes · 20 min read
Secure Kubernetes Secrets: Comparing Sealed Secrets, External Secrets Operator, and CSI Driver
Architecture Digest
Architecture Digest
Mar 20, 2023 · Cloud Native

Kubernetes: What It Is and Why It’s Hard to Get Started

This article provides a concise, question‑and‑answer overview of Kubernetes, explaining its role as a distributed container‑orchestration system, the architecture of master and worker nodes, core components such as etcd, kube‑apiserver, scheduler, controllers, and how services, pods, labels, and scaling operate within a cluster.

Cloud NativeCluster ManagementControllers
0 likes · 8 min read
Kubernetes: What It Is and Why It’s Hard to Get Started
Efficient Ops
Efficient Ops
Mar 19, 2023 · Cloud Native

Master Real-Time Multi-Pod Logging in Kubernetes with Kubetail & Stern

This guide introduces two lightweight Kubernetes log‑tailing utilities, Kubetail and Stern, explaining their installation on various platforms, core command‑line options, and practical usage examples for aggregating and color‑coding logs from multiple pods and containers, offering a simpler alternative to heavyweight logging stacks.

CLIKuberneteskubetail
0 likes · 10 min read
Master Real-Time Multi-Pod Logging in Kubernetes with Kubetail & Stern
ITPUB
ITPUB
Mar 16, 2023 · Cloud Native

How Kindling Leverages eBPF for Minute‑Level Fault Diagnosis in Cloud‑Native Environments

The interview with Kindling founder Cheng Chan explores how eBPF‑based Kindling tackles the overwhelming metrics, high expertise barrier, and lack of real‑time protocol parsing in cloud‑native observability, detailing its probe architecture, protocol analysis, and roadmap for faster, standardized root‑cause detection.

Cloud NativeKindlingKubernetes
0 likes · 13 min read
How Kindling Leverages eBPF for Minute‑Level Fault Diagnosis in Cloud‑Native Environments
Alibaba Cloud Native
Alibaba Cloud Native
Mar 16, 2023 · Cloud Native

How Koordinator Supercharges ACK Container Scheduling and Resource Efficiency

Koordinator, an open‑source cloud‑native scheduler from Alibaba, enhances container performance and reduces cluster costs by introducing mixed‑workload placement, resource profiling, load‑aware scheduling, and differentiated SLO mixing, now fully integrated into Alibaba Cloud ACK with a new v1.1.1‑ack.1 release.

ACKCloud NativeKoordinator
0 likes · 10 min read
How Koordinator Supercharges ACK Container Scheduling and Resource Efficiency
Hulu Beijing
Hulu Beijing
Mar 16, 2023 · Artificial Intelligence

Inside Hulu’s Distributed Training Platform: Architecture, Challenges, and Solutions

This article explores Hulu’s five‑year‑old machine‑learning training platform, detailing its three‑layer architecture, the shift from single‑node to distributed training, and the technical solutions—including parameter servers, Ring AllReduce, Kubernetes, Volcano, and Horovod—that enable scalable AI workloads across GPU, CPU, and storage resources.

AI InfrastructureDistributed TrainingHulu
0 likes · 13 min read
Inside Hulu’s Distributed Training Platform: Architecture, Challenges, and Solutions
Cloud Native Technology Community
Cloud Native Technology Community
Mar 16, 2023 · Cloud Native

How Intel and F5 Enabled Dual‑Stack Support in Istio 1.17

This article details the collaborative effort between Intel and F5 to redesign and implement dual‑stack networking support in Istio 1.17, covering the background challenges, new RFC design, key Envoy changes, step‑by‑step experimental setup, listener and endpoint modifications, and ways for the community to contribute.

Cloud NativeDual-StackEnvoy
0 likes · 11 min read
How Intel and F5 Enabled Dual‑Stack Support in Istio 1.17
JD Cloud Developers
JD Cloud Developers
Mar 15, 2023 · Operations

Designing Seamless Offline Delivery for Private Cloud Environments

This article outlines a general, process‑focused approach to offline delivery in private or dedicated cloud environments, covering the need for internal mirrors, plug‑in architecture, dependency awareness, full automation, and best‑practice process design to reduce SRE effort and ensure consistent production.

KubernetesOperationsautomation
0 likes · 5 min read
Designing Seamless Offline Delivery for Private Cloud Environments
Tencent Cloud Middleware
Tencent Cloud Middleware
Mar 14, 2023 · Cloud Native

How a Logistics SaaS Company Scaled to Millions Using Cloud‑Native Microservices

This article examines how the Chinese logistics SaaS firm HaiGuanJia leveraged cloud‑native technologies—Kubernetes, service mesh, and microservice frameworks—to overcome rapid user growth, improve development efficiency, enable gray releases, and smoothly migrate legacy systems while maintaining stability and agility.

KubernetesLogisticsSaaS
0 likes · 16 min read
How a Logistics SaaS Company Scaled to Millions Using Cloud‑Native Microservices
DevOps Cloud Academy
DevOps Cloud Academy
Mar 14, 2023 · Cloud Native

Kustomize Tutorial: Managing Kubernetes Manifests Without Helm

This article introduces Kustomize as a native Kubernetes tool that replaces Helm, explains its declarative philosophy, and provides step‑by‑step examples for building base resources, creating overlays, applying patches, generating secrets, and updating images using simple command‑line operations.

Cloud NativeDevOpsKubernetes
0 likes · 13 min read
Kustomize Tutorial: Managing Kubernetes Manifests Without Helm
New Oriental Technology
New Oriental Technology
Mar 10, 2023 · Cloud Native

Middleware PaaS on Kubernetes: Architecture, Benefits, and IP Reservation Challenges

This article explains how the New Oriental architecture team migrated middleware services like Redis, Kafka, and RocketMQ to Kubernetes, detailing the benefits over traditional PaaS, the Capo IP reservation solution for network stability, and the resulting operational, observability, and resource utilization improvements.

Cloud NativeKubernetesPaaS
0 likes · 18 min read
Middleware PaaS on Kubernetes: Architecture, Benefits, and IP Reservation Challenges
Alibaba Cloud Native
Alibaba Cloud Native
Mar 10, 2023 · Cloud Native

Uncovering the Root Causes of ACK Cluster Network Latency: kubelet, softirq, and cgroup Insights

A detailed post‑mortem explains how excessive cgroup files, kubelet's sys‑CPU usage, soft‑interrupt scheduling delays, and a buggy page‑free routine caused intermittent hundreds‑of‑milliseconds network latency in an Alibaba Cloud ACK cluster, and how targeted CPU binding and kernel patches resolved the issue.

Cloud NativeKubernetesNetwork Latency
0 likes · 14 min read
Uncovering the Root Causes of ACK Cluster Network Latency: kubelet, softirq, and cgroup Insights
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 10, 2023 · Cloud Native

How KubeVela Enables Full‑Stack Declarative Observability for Cloud‑Native Apps

This article explores KubeVela’s full‑stack declarative observability framework, detailing cloud‑native monitoring challenges, the Prism Aggregated API approach, multi‑cluster configurations, and out‑of‑the‑box addons that let developers and platform engineers seamlessly integrate, customize, and scale metrics, logs, and dashboards across heterogeneous environments.

Cloud NativeDeclarativeKubeVela
0 likes · 21 min read
How KubeVela Enables Full‑Stack Declarative Observability for Cloud‑Native Apps
Huolala Tech
Huolala Tech
Mar 9, 2023 · Cloud Native

How SHANGFU Transforms Prometheus Management for Scalable Cloud‑Native Monitoring

This article explains Prometheus fundamentals, compares long‑term storage options, details Huolala's challenges with multiple Prometheus clusters, and introduces SHANGFU—a three‑module system that streamlines configuration, collection, and query handling to boost observability, performance, and reliability in cloud‑native environments.

Cloud NativeKubernetesPrometheus
0 likes · 15 min read
How SHANGFU Transforms Prometheus Management for Scalable Cloud‑Native Monitoring
Alibaba Cloud Native
Alibaba Cloud Native
Mar 8, 2023 · Cloud Native

How OpenYurt v1.2 Simplifies Edge Kubernetes Installation in Five Steps

OpenYurt v1.2.0 streamlines edge‑native Kubernetes deployment by removing any modifications to native clusters, cutting the installation process from ten to five steps, and enabling seamless Prometheus monitoring through the new Raven VPN component while outlining future Helm‑based simplifications.

Cloud NativeEdge ComputingInstallation
0 likes · 6 min read
How OpenYurt v1.2 Simplifies Edge Kubernetes Installation in Five Steps
政采云技术
政采云技术
Mar 7, 2023 · Cloud Native

Zero‑Base Automated Deployment Using Docker, Jenkins, and GitLab CI

This tutorial walks you through building a complete automated deployment pipeline from scratch, covering project setup on GitHub, Dockerized Tomcat and Jenkins containers, GitLab CI vs Jenkins comparison, Jenkins job configuration, webhook triggers, and shell scripting for continuous integration and delivery.

DevOpsDockerGitHub
0 likes · 11 min read
Zero‑Base Automated Deployment Using Docker, Jenkins, and GitLab CI
Ops Development Stories
Ops Development Stories
Mar 6, 2023 · Databases

How to Deploy and Use Bytebase for Database CI/CD on Kubernetes

This guide explains why traditional DBA tasks are tedious, introduces Bytebase as a reliable database CI/CD platform, and provides step‑by‑step instructions for deploying Bytebase and PostgreSQL on Kubernetes, configuring GitLab integration, managing users, instances, projects, environments, and performing schema changes and data operations.

BytebaseDatabase CI/CDDevOps
0 likes · 11 min read
How to Deploy and Use Bytebase for Database CI/CD on Kubernetes
Ops Development Stories
Ops Development Stories
Mar 3, 2023 · Cloud Native

Integrating Gitee with Zadig for Seamless Microservice CI/CD

This guide walks you through adding a Gitee code source to Zadig, configuring a microservice-demo project with Vue.js frontend and Golang backend, setting up services, builds, environments, workflows, and automatic triggers to achieve end‑to‑end continuous delivery on Kubernetes.

GiteeIntegrationKubernetes
0 likes · 8 min read
Integrating Gitee with Zadig for Seamless Microservice CI/CD
Alibaba Cloud Native
Alibaba Cloud Native
Mar 2, 2023 · Cloud Native

Master Multi‑Cluster GitOps with ACK One and ArgoCD – A Step‑by‑Step Guide

This guide walks you through using ACK One’s GitOps capabilities to manage multi‑cluster Kubernetes deployments with ArgoCD, covering prerequisites, CLI commands, console operations, application version upgrades, rollbacks, user‑permission management, Applicationset for multi‑cluster scaling, and Image Updater integration for end‑to‑end CI/CD automation.

ACK OneArgoCDCloud Native
0 likes · 18 min read
Master Multi‑Cluster GitOps with ACK One and ArgoCD – A Step‑by‑Step Guide
dbaplus Community
dbaplus Community
Feb 28, 2023 · Operations

How Container SRE at DeWu Boosts Reliability: Practices, Metrics, and Incident Playbooks

This article details DeWu's container SRE approach, covering SRE fundamentals, on‑call response, SLO/SLA design, change management, capacity planning, kernel‑parameter monitoring, security safeguards, and a real‑world incident analysis, providing actionable insights for building resilient cloud‑native services.

CapacityPlanningIncidentResponseKubernetes
0 likes · 24 min read
How Container SRE at DeWu Boosts Reliability: Practices, Metrics, and Incident Playbooks
ByteDance SYS Tech
ByteDance SYS Tech
Feb 28, 2023 · Cloud Native

How ByteDance’s ARES Boosts Cloud‑Native Resilience with Chaos Engineering

This article explains ByteDance’s end‑to‑end chaos engineering practice for cloud‑native environments, covering its background, principles, comparison with traditional testing, the evolution of its internal platforms, and a detailed look at the Application Resilience Enhancement Service (ARES) and its core features.

Fault InjectionKubernetesMicroservices
0 likes · 17 min read
How ByteDance’s ARES Boosts Cloud‑Native Resilience with Chaos Engineering
Alibaba Cloud Native
Alibaba Cloud Native
Feb 27, 2023 · Cloud Native

How CNStack 2.0 Enables Multi‑Cloud, Multi‑Cluster Management with OCM

CNStack 2.0 introduces a cloud‑native multi‑cluster service built on Open Cluster Management, offering unified registration, lifecycle management, resource distribution, multi‑tenant authentication, and high‑availability cross‑cluster communication for Kubernetes clusters across clouds.

AuthenticationCloud NativeCluster Registration
0 likes · 15 min read
How CNStack 2.0 Enables Multi‑Cloud, Multi‑Cluster Management with OCM
Alibaba Cloud Native
Alibaba Cloud Native
Feb 27, 2023 · Cloud Native

What’s Next for Microservices? Highlights from the Beijing Cloud Native Meetup

The Beijing "Microservices x Container Open Source Developer Meetup" gathered over 100 developers and core maintainers of leading cloud‑native projects to discuss next‑generation microservice architectures, static compilation, service governance, multi‑cluster management, observability, and more, providing deep technical insights and real‑world examples.

Cloud NativeKubernetesobservability
0 likes · 11 min read
What’s Next for Microservices? Highlights from the Beijing Cloud Native Meetup
Top Architect
Top Architect
Feb 27, 2023 · Cloud Native

Deploying a K8s ChatGPT Bot with Robusta for Intelligent Alert Troubleshooting

This article guides readers through setting up a Kubernetes‑based ChatGPT bot using the open‑source Robusta platform, covering prerequisites, installation, Slack integration, configuration generation, Helm deployment, testing with crash pods, and interactive alert handling to streamline Prometheus alert resolution.

ChatGPTKubernetesPrometheus
0 likes · 12 min read
Deploying a K8s ChatGPT Bot with Robusta for Intelligent Alert Troubleshooting
Architect
Architect
Feb 25, 2023 · Cloud Native

Deploying a K8s ChatGPT Bot with Robusta: A Step‑by‑Step Guide

This article walks through installing Robusta, configuring Slack integration, adding Helm repositories, deploying the Robusta platform on a Kubernetes cluster, creating a crash‑loop pod to trigger alerts, and interacting with a ChatGPT bot to automatically troubleshoot Prometheus alerts, providing complete code snippets and screenshots for each step.

AI OpsChatGPTKubernetes
0 likes · 12 min read
Deploying a K8s ChatGPT Bot with Robusta: A Step‑by‑Step Guide
Baidu Geek Talk
Baidu Geek Talk
Feb 24, 2023 · Cloud Native

Design and Resource Scheduling of Cloud‑Native AI and the PaddleFlow Workflow Engine

The article explains Baidu’s cloud‑native AI resource scheduling across single‑ and multi‑GPU nodes, describes the PaddleFlow Kubernetes‑based workflow engine with its hierarchical queues, advanced scheduling algorithms, unified storage, and how these technologies improve GPU utilization, reduce fragmentation, and simplify AI task orchestration.

AIKubernetesPaddleFlow
0 likes · 23 min read
Design and Resource Scheduling of Cloud‑Native AI and the PaddleFlow Workflow Engine
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Feb 24, 2023 · Cloud Native

NetEase Cloud Music Open-Sources Horizon: A Kubernetes-Based GitOps Continuous Deployment Platform

NetEase Cloud Music open-sourced Horizon, a Kubernetes-based GitOps continuous deployment platform, offering standardized Helm‑based templates, RBAC, multi‑cloud support, CI integration, and extensibility, built on Argo CD, Tekton, and other components, now used in large‑scale production across multiple regions.

Argo CDCloud NativeGitOps
0 likes · 9 min read
NetEase Cloud Music Open-Sources Horizon: A Kubernetes-Based GitOps Continuous Deployment Platform
Alibaba Cloud Native
Alibaba Cloud Native
Feb 23, 2023 · Cloud Native

How OpenYurt Enables Large‑Scale Edge Computing for Longyuan Power

This article explains how OpenYurt, an unobtrusive cloud‑native edge platform, integrates with the CNStack technology hub to deliver high‑availability, offline‑autonomous, and programmable edge services for Longyuan Power’s massive multi‑province server fleet.

CNStackCloud NativeDistributed Systems
0 likes · 10 min read
How OpenYurt Enables Large‑Scale Edge Computing for Longyuan Power
Zhuanzhuan Tech
Zhuanzhuan Tech
Feb 20, 2023 · Operations

Evolution of Zhuanzhuan's Test Environments: From Monolithic Setups to Docker‑Based Dynamic and Stable Platforms

This article details how Zhuanzhuan transformed its testing infrastructure from a handful of monolithic servers to a Docker‑driven, tag‑routed dynamic and stable environment, addressing resource shortages, waste, and stability issues while achieving significant reductions in deployment time, resource consumption, and user‑reported problems.

DevOpsDockerKubernetes
0 likes · 14 min read
Evolution of Zhuanzhuan's Test Environments: From Monolithic Setups to Docker‑Based Dynamic and Stable Platforms
转转QA
转转QA
Feb 17, 2023 · Operations

Evolution of Zhuanzhuan's Test Environments: From Monolithic Setups to Docker‑Based Dynamic and Stable Environments

This article details how Zhuanzhuan’s testing environment progressed from a handful of static machines to a Docker‑driven dynamic‑and‑stable architecture, addressing resource shortages, stability issues, and operational inefficiencies through IP routing, tag routing, and extensive automation, ultimately achieving significant reductions in resource usage, deployment time, and user‑reported problems.

DevOpsDockerEnvironment
0 likes · 13 min read
Evolution of Zhuanzhuan's Test Environments: From Monolithic Setups to Docker‑Based Dynamic and Stable Environments
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 16, 2023 · Cloud Native

How AppManager Enables Scalable Multi‑Cloud Application Deployment

This article explains how AppManager, built on OAM and Groovy plug‑ins, provides extensible multi‑cloud management by supporting dynamic component, trait, and workflow integration, automated build‑packaging, resource add‑ons, multi‑environment isolation, and built‑in state monitoring for reliable application delivery.

AppManagerCloud NativeDevOps
0 likes · 15 min read
How AppManager Enables Scalable Multi‑Cloud Application Deployment
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Feb 15, 2023 · Operations

How YARN and Kubernetes Solve Distributed Resource Management Challenges

This article explains how Apache YARN and Google Kubernetes address the three core problems of resource utilization, task responsiveness, and flexible scheduling in distributed environments, detailing their architectures, scheduling models, and practical implications for modern big‑data and cloud workloads.

KubernetesResource ManagementScheduling
0 likes · 8 min read
How YARN and Kubernetes Solve Distributed Resource Management Challenges
JD Cloud Developers
JD Cloud Developers
Feb 14, 2023 · Cloud Native

Why Kubernetes Is the Backbone of Modern Cloud‑Native Architecture

This article explains the evolution from monolithic to microservice architectures, introduces Kubernetes as the core cloud‑native platform, and details its components, design principles, and resource management strategies for compute, networking, and storage within a cluster.

CSIIngressKubernetes
0 likes · 22 min read
Why Kubernetes Is the Backbone of Modern Cloud‑Native Architecture