Tagged articles
4047 articles
Page 19 of 41
Bilibili Tech
Bilibili Tech
Feb 14, 2023 · Cloud Native

Bilibili's Vertical Pod Autoscaler (VPA) Practice and Cluster Resource Governance

Bilibili extended Kubernetes with a custom in‑place Vertical Pod Autoscaler framework—including generator, recommender, updater, and webhook controllers plus a management platform for strategy tuning, avoidance, analysis, and anomaly detection—reducing over‑provisioned resources across its ten‑thousand‑node private cloud and achieving up to 60 % CPU and 30 % memory savings.

KubernetesSREvertical pod autoscaler
0 likes · 19 min read
Bilibili's Vertical Pod Autoscaler (VPA) Practice and Cluster Resource Governance
JD Cloud Developers
JD Cloud Developers
Feb 13, 2023 · Cloud Native

Why Docker and Kubernetes Are Revolutionizing Cloud‑Native Development

This article explains Docker’s lightweight container engine, its goals, core concepts such as images, containers, and repositories, compares containers to virtual machines, introduces Dockerfile, cgroups, Docker Compose, Docker Machine, and provides an overview of Kubernetes architecture and components, highlighting their role in cloud‑native environments.

ContainersDevOpsDocker
0 likes · 13 min read
Why Docker and Kubernetes Are Revolutionizing Cloud‑Native Development
21CTO
21CTO
Feb 10, 2023 · Cloud Native

Why Kubernetes Is So Hard to Master: A Beginner’s Q&A Walkthrough

This article introduces Kubernetes fundamentals through a series of questions and answers, covering its architecture, node communication, pod scheduling, data storage, external access, scaling mechanisms, and component coordination, all illustrated with clear diagrams.

Cluster ManagementContainersKubernetes
0 likes · 9 min read
Why Kubernetes Is So Hard to Master: A Beginner’s Q&A Walkthrough
Top Architect
Top Architect
Feb 7, 2023 · Cloud Native

Understanding Kubernetes: Core Concepts and Architecture

This article provides a concise, question‑driven overview of Kubernetes, covering its architecture, node and master communication, pod fundamentals, scheduling, storage via etcd, service exposure, scaling mechanisms, and the roles of core components such as kube‑apiserver, kubelet, kube‑proxy and controllers.

Cloud NativeCluster ManagementContainers
0 likes · 9 min read
Understanding Kubernetes: Core Concepts and Architecture
Cloud Native Technology Community
Cloud Native Technology Community
Feb 7, 2023 · Cloud Native

Machine Learning‑Based Optimization of Kubernetes Resources

This article explains how machine learning can be applied to automatically optimize CPU and memory settings in Kubernetes clusters, covering both experiment‑driven and observation‑driven approaches, step‑by‑step procedures, best‑practice recommendations, and the benefits of combining both methods for efficient, scalable cloud‑native operations.

KubernetesResource Optimizationautoscaling
0 likes · 11 min read
Machine Learning‑Based Optimization of Kubernetes Resources
IT Architects Alliance
IT Architects Alliance
Feb 6, 2023 · Cloud Native

What Is Kubernetes and Why Is It Hard to Get Started?

This article introduces Kubernetes as a Google‑originated container‑based distributed cluster management system, explaining its architecture, core components such as Master, Nodes, Pods, Services, etcd, and detailing how communication, scheduling, storage, external access, scaling, and controller coordination work together.

Cloud NativeDistributed SystemsKubernetes
0 likes · 8 min read
What Is Kubernetes and Why Is It Hard to Get Started?
Ops Development Stories
Ops Development Stories
Feb 6, 2023 · Cloud Native

How to Deploy Odigos for Zero‑Code Observability on Kubernetes

This guide walks you through installing and configuring the open‑source Odigos observability control plane on a Kubernetes cluster, showing how to automatically collect traces, metrics, and logs from applications without modifying code and how to visualize the data with Grafana.

KubernetesOdigosOpenTelemetry
0 likes · 11 min read
How to Deploy Odigos for Zero‑Code Observability on Kubernetes
Efficient Ops
Efficient Ops
Feb 5, 2023 · Cloud Native

Unlock Hidden kubectl Tricks: Boost Your Kubernetes Workflow

This guide showcases advanced kubectl techniques—including printing API details, filtering and deleting pods by status, counting node‑wise pod distribution, and leveraging kubectl proxy—to help Kubernetes users streamline debugging and routine cluster management tasks.

Kubernetescommand-linekubectl
0 likes · 7 min read
Unlock Hidden kubectl Tricks: Boost Your Kubernetes Workflow
Selected Java Interview Questions
Selected Java Interview Questions
Feb 1, 2023 · Cloud Native

Introduction to Rancher: Features, Installation, and Application Deployment

This article introduces Rancher as a comprehensive container management platform, explains its API server capabilities, monitoring and alerting features, provides step‑by‑step Docker‑based installation instructions, and demonstrates how to bind a Kubernetes cluster, deploy applications, and view pod logs through the Rancher UI.

Cloud NativeContainer ManagementDocker
0 likes · 4 min read
Introduction to Rancher: Features, Installation, and Application Deployment
Cloud Native Technology Community
Cloud Native Technology Community
Feb 1, 2023 · Cloud Native

Why Is Kubernetes So Hard to Master? A Step‑by‑Step Overview

This article breaks down the core concepts of Kubernetes—including its master‑worker architecture, pod scheduling, etcd storage, service exposure, scaling mechanisms, and controller interactions—through a series of clear questions and illustrated answers to help beginners grasp the platform’s complexity.

Cloud NativeKubernetesPod Scheduling
0 likes · 8 min read
Why Is Kubernetes So Hard to Master? A Step‑by‑Step Overview
MaGe Linux Operations
MaGe Linux Operations
Jan 31, 2023 · Cloud Native

Mastering ulimit and cgroup: Limit Files & Threads in Docker/Kubernetes

This article explains how Linux's ulimit and cgroup mechanisms can be used to restrict file descriptors and thread counts in Docker and Kubernetes environments, compares configuration methods, presents experimental results, and offers practical recommendations for setting limits at the container, pod, and host levels.

ContainerKubernetescgroup
0 likes · 17 min read
Mastering ulimit and cgroup: Limit Files & Threads in Docker/Kubernetes
Qunar Tech Salon
Qunar Tech Salon
Jan 31, 2023 · Operations

Root Cause Analysis and Mitigation of JVM GC‑Induced OOM and Memory Fragmentation in a Containerized Hotel Pricing Service

This article details how long JVM garbage‑collection pauses and glibc ptmalloc memory‑fragmentation caused container OOM kills in a hotel‑pricing system, and explains the step‑by‑step diagnosis, JVM tuning, Kubernetes health‑check adjustments, and the replacement of ptmalloc with jemalloc to eliminate the issue.

JVMKubernetesMemoryFragmentation
0 likes · 9 min read
Root Cause Analysis and Mitigation of JVM GC‑Induced OOM and Memory Fragmentation in a Containerized Hotel Pricing Service
Open Source Linux
Open Source Linux
Jan 28, 2023 · Cloud Native

Mastering Kubernetes Probes: Startup, Liveness, and Readiness Explained

This article explains why Kubernetes uses Startup, Liveness, and Readiness probes, describes common Pod states and restart policies, compares the probes, details their configuration fields, and provides practical YAML examples for each probe type to ensure reliable container health monitoring.

KubernetesLivenessProbePod
0 likes · 17 min read
Mastering Kubernetes Probes: Startup, Liveness, and Readiness Explained
DataFunSummit
DataFunSummit
Jan 20, 2023 · Cloud Native

Design and Architecture of JuiceFS: A Cloud‑Native Distributed File System

This article reviews the evolution of file storage, outlines the challenges of the cloud era, and details JuiceFS's design philosophy, architecture, key capabilities, and real‑world use cases such as Kubernetes, AI, big‑data analytics, and NAS migration to the cloud.

AICloud NativeDistributed File System
0 likes · 22 min read
Design and Architecture of JuiceFS: A Cloud‑Native Distributed File System
DataFunTalk
DataFunTalk
Jan 18, 2023 · Big Data

Five Major Trends Shaping Big Data, AI, and Cloud Industries in 2023

The article forecasts five key trends for 2023—including cloud cost optimization, multi‑cloud freedom, rapid AI model adoption, expanding data‑sharing ecosystems, and the convergence of data warehouses and lakes—highlighting how they will reshape the big data, artificial intelligence, and cloud landscapes.

AlluxioKubernetesdata sharing
0 likes · 6 min read
Five Major Trends Shaping Big Data, AI, and Cloud Industries in 2023
Alibaba Cloud Native
Alibaba Cloud Native
Jan 18, 2023 · Cloud Native

Decoding Terway ENI‑Trunking: Data‑Plane Paths and SOP Scenarios in Alibaba Cloud

This article provides a deep technical walkthrough of Alibaba Cloud's Terway ENI‑Trunking mode, explaining its architecture, pod‑level networking resources, VLAN‑based traffic steering, security‑group handling, and ten concrete SOP scenarios that illustrate how data packets travel between pods, services, and external clients.

Cloud Native NetworkingENI-TrunkingKubernetes
0 likes · 29 min read
Decoding Terway ENI‑Trunking: Data‑Plane Paths and SOP Scenarios in Alibaba Cloud
Open Source Linux
Open Source Linux
Jan 17, 2023 · Backend Development

Why Your Java App Gets OOMKilled in Kubernetes and How to Fix It

This article explains why Java applications running in Kubernetes containers are often terminated with OOMKilled (exit code 137), analyzes the underlying JVM memory‑limit mismatches, and provides practical solutions using cgroup‑aware JVM flags and memory‑tuning techniques.

DockerJVMKubernetes
0 likes · 14 min read
Why Your Java App Gets OOMKilled in Kubernetes and How to Fix It
DevOps
DevOps
Jan 17, 2023 · Operations

Building a DevOps CI/CD Pipeline: A Five‑Step Guide

This article walks beginners through the fundamentals of DevOps by outlining a practical five‑step process for creating a CI/CD pipeline, covering tools for continuous integration, source control, build automation, web server deployment, test coverage, and optional extensions such as containers and middleware automation.

DockerGitJenkins
0 likes · 15 min read
Building a DevOps CI/CD Pipeline: A Five‑Step Guide
Efficient Ops
Efficient Ops
Jan 15, 2023 · Cloud Native

Understanding kubectl top: How Kubernetes Monitors Nodes and Pods

This article explains how the kubectl top command retrieves real‑time CPU and memory metrics for Kubernetes nodes and pods, details the underlying data flow, metric‑server and cAdvisor architecture, and addresses common issues and calculation differences compared to traditional system tools.

KubernetescAdvisorkubectl top
0 likes · 15 min read
Understanding kubectl top: How Kubernetes Monitors Nodes and Pods
Alibaba Cloud Native
Alibaba Cloud Native
Jan 15, 2023 · Cloud Native

What Real‑World Cloud‑Native Metrics Reveal About JDK, Frameworks, and Resource Usage

Analyzing a year‑long EDAS report of tens of thousands of cloud‑native applications, this article uncovers trends in JDK version adoption, microservice framework choices, resource shape shifts, instance specifications, JVM heap settings, startup latency, elastic policy usage, and health indicators, offering actionable insights for architects.

JDKKubernetesMicroservices
0 likes · 13 min read
What Real‑World Cloud‑Native Metrics Reveal About JDK, Frameworks, and Resource Usage
Alibaba Cloud Native
Alibaba Cloud Native
Jan 14, 2023 · Cloud Native

Why Java Apps OOM in Kubernetes Even Below Xmx and How to Fix It

This article explains why Java applications running in Kubernetes can encounter Out‑Of‑Memory errors despite heap usage staying under the Xmx limit, by examining container resource limits, JVM memory models, cgroup behavior, and provides practical configuration recommendations to prevent OOM.

Cloud NativeJVMKubernetes
0 likes · 16 min read
Why Java Apps OOM in Kubernetes Even Below Xmx and How to Fix It
Ctrip Technology
Ctrip Technology
Jan 12, 2023 · Big Data

Evolution of Ctrip's Log System: From Elasticsearch to ClickHouse and Log 3.0

This article details the evolution of Ctrip's log infrastructure, describing the shift from fragmented departmental logging to a unified Elasticsearch-based platform, the migration to ClickHouse for cost‑effective, high‑performance storage, and the subsequent Log 3.0 redesign that leverages Kubernetes, sharding, and a unified query governance layer to handle petabyte‑scale data.

Big DataCloud NativeETL
0 likes · 16 min read
Evolution of Ctrip's Log System: From Elasticsearch to ClickHouse and Log 3.0
Cloud Native Technology Community
Cloud Native Technology Community
Jan 11, 2023 · Cloud Native

Key Kubernetes Trends in 2022: Mainstream Adoption, Edge Growth, Open‑Source Ecosystem, Stateful Deployments, and Ongoing Challenges

The 2022 Kubernetes landscape saw mainstream adoption with widespread managed services, increased edge usage, a thriving open‑source ecosystem, growing interest in stateful workloads, and persistent operational and security challenges, highlighting both the platform's maturity and the work still needed for broader enterprise confidence.

2022 trendsEdge ComputingKubernetes
0 likes · 7 min read
Key Kubernetes Trends in 2022: Mainstream Adoption, Edge Growth, Open‑Source Ecosystem, Stateful Deployments, and Ongoing Challenges
Open Source Linux
Open Source Linux
Jan 11, 2023 · Cloud Computing

How Docker’s Rise and Fall Shaped the Cloud Container Landscape

This article chronicles Docker’s rapid ascent, leadership turmoil, competition with Kubernetes, and eventual sale to Mirantis, illustrating how a pioneering container platform became both a catalyst for cloud innovation and a cautionary tale for open‑source startups.

DockerKubernetesTech Business
0 likes · 14 min read
How Docker’s Rise and Fall Shaped the Cloud Container Landscape
Alibaba Cloud Native
Alibaba Cloud Native
Jan 9, 2023 · Cloud Native

CNStack 2.0: Cloud‑Native Design for Agile, Secure Multi‑Cluster Ops

CNStack 2.0 is a cloud‑native PaaS platform built on Kubernetes that unifies resource and workload management, offering agile, open, and secure multi‑cluster capabilities through modular cloud services, a unified API gateway, and integration with open‑source projects such as Sealer, Emissary‑Ingress, cert‑manager, Velero, and OCM.

KubernetesMulti-ClusterResource Management
0 likes · 24 min read
CNStack 2.0: Cloud‑Native Design for Agile, Secure Multi‑Cluster Ops
Alibaba Cloud Native
Alibaba Cloud Native
Jan 4, 2023 · Cloud Native

Explore Koordinator v1.1: Load‑Aware Scheduling, cgroup v2, and Descheduler Updates

Koordinator v1.1 introduces load‑aware scheduling with workload‑type awareness, percentile‑based resource aggregation, cgroup v2 support, a new LowNodeLoad descheduler plugin for load‑aware rebalancing, expanded performance collectors, ServiceMonitor integration, and detailed configuration examples, aiming to improve latency‑sensitive workloads and overall cluster resource efficiency.

CloudNativeDeschedulerKubernetes
0 likes · 25 min read
Explore Koordinator v1.1: Load‑Aware Scheduling, cgroup v2, and Descheduler Updates
Cloud Native Technology Community
Cloud Native Technology Community
Jan 4, 2023 · Cloud Native

Configuring External Egress Gateways in Kube-OVN

This guide explains how to route outbound container traffic through a centralized external gateway using Kube-OVN by defining a Subnet resource with specific routing and policy settings, and clarifies each required field for proper configuration.

CNICloud NativeExternal Gateway
0 likes · 4 min read
Configuring External Egress Gateways in Kube-OVN
Alibaba Cloud Native
Alibaba Cloud Native
Jan 3, 2023 · Cloud Native

How KubeVela Workflow Transforms SAE’s Serverless Architecture for Faster Cloud‑Native Upgrades

This article explains how Alibaba Cloud's Serverless Application Engine (SAE) leverages the open‑source KubeVela Workflow to overcome operational, scaling, and integration challenges, detailing the workflow design, step definitions, and three real‑world use cases that illustrate automated ops, release optimization, and rapid feature rollout.

Cloud NativeKubeVelaKubernetes
0 likes · 17 min read
How KubeVela Workflow Transforms SAE’s Serverless Architecture for Faster Cloud‑Native Upgrades
DataFunSummit
DataFunSummit
Jan 1, 2023 · Big Data

Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans

The Shopee Data Infra talk details the current storage architecture, Presto‑based acceleration with Alluxio caching, service‑oriented storage solutions using Alluxio Fuse and S3 APIs, and outlines future enhancements for Spark/Hive integration and CSI/Fuse optimizations, providing a comprehensive view of large‑scale big data storage engineering.

AlluxioCache ManagerData Infrastructure
0 likes · 16 min read
Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans
DataFunTalk
DataFunTalk
Jan 1, 2023 · Big Data

Zhihu's Real-Time Computing Platform: From Skytree 1.0 to Mipha 2.0

Zhihu’s real‑time computing platform, initially built as Skytree 1.0 on Kubernetes and later re‑engineered as Mipha 2.0 with Flink SQL, unified metadata management, dynamic jar loading, UDF support, Protobuf format, CDC integration, and extensive operational optimizations, now processes petabyte‑scale data with high reliability.

FlinkKubernetesReal‑Time Computing
0 likes · 21 min read
Zhihu's Real-Time Computing Platform: From Skytree 1.0 to Mipha 2.0
Open Source Linux
Open Source Linux
Dec 30, 2022 · Operations

Top 7 Kubernetes Management Tools to Simplify Cluster Operations

This article introduces seven popular Kubernetes management solutions—including K9s, Rancher, the native Dashboard with Kubectl and Kubeadm, Helm, KubeSpray, Kontena Lens, and WKSctl—detailing their key features, usage scenarios, and how they help streamline cluster monitoring, deployment, scaling, and security across cloud‑native environments.

Cluster ManagementDevOpsKubernetes
0 likes · 9 min read
Top 7 Kubernetes Management Tools to Simplify Cluster Operations
Efficient Ops
Efficient Ops
Dec 29, 2022 · Operations

How eBay Scales Its Event Platform with ClickHouse and Kubernetes

This article details eBay's event platform architecture, explaining why a dedicated event system is needed, how ClickHouse provides high‑performance storage, the use of Kubernetes CRDs for cross‑region high availability, data routing, read/write separation, and query optimizations with LogQL.

Event PlatformKubernetesclickhouse
0 likes · 18 min read
How eBay Scales Its Event Platform with ClickHouse and Kubernetes
MaGe Linux Operations
MaGe Linux Operations
Dec 28, 2022 · Cloud Native

Master Essential kubectl Commands: A Practical Guide for Kubernetes Ops

This comprehensive guide covers kubectl autocomplete, context configuration, object creation, resource viewing, updating, patching, editing, scaling, deletion, pod and node interaction, as well as the versatile kubectl set commands, formatted output options, and visual references for effective Kubernetes cluster management.

KubernetesOperationscloud-native
0 likes · 15 min read
Master Essential kubectl Commands: A Practical Guide for Kubernetes Ops
Ops Development Stories
Ops Development Stories
Dec 28, 2022 · Operations

When a Massive File Transfer Crashed My K8s Master: A Real‑World Docker Recovery Tale

The author recounts a sudden overload caused by copying hundreds of gigabytes of small files to an Alibaba Cloud NAS, which crashed the master node of a Kubernetes cluster, leading to Docker failures, and describes step‑by‑step troubleshooting, configuration changes, and lessons learned about backups, cautious operations, and calm analysis.

Cloud NativeDockerKubernetes
0 likes · 5 min read
When a Massive File Transfer Crashed My K8s Master: A Real‑World Docker Recovery Tale
dbaplus Community
dbaplus Community
Dec 26, 2022 · Cloud Native

How Bilibili Boosted Server Utilization with Kubernetes Co‑Location Strategies

This article explains how Bilibili’s large‑scale Kubernetes cloud platform reduces costs and improves machine utilization by applying co‑location (mixed‑tenant) techniques, including resource‑aware scheduling, dynamic isolation, and a dedicated management console across online, offline, and idle‑machine scenarios.

Cloud NativeCo-locationKubernetes
0 likes · 17 min read
How Bilibili Boosted Server Utilization with Kubernetes Co‑Location Strategies
ITPUB
ITPUB
Dec 26, 2022 · Cloud Native

What Really Happens When You Deploy an App on Kubernetes?

This article walks through the complete lifecycle of a Kubernetes deployment, explaining how a manual upgrade request triggers API calls, creates Deployments, ReplicaSets, Pods, and how the scheduler, kubelet, and Docker work together, while also covering concepts like containers, labels, replication controllers, deployments, and autoscaling mechanisms.

Cloud NativeContainersDeployment
0 likes · 23 min read
What Really Happens When You Deploy an App on Kubernetes?
Tencent Cloud Developer
Tencent Cloud Developer
Dec 26, 2022 · Cloud Native

Challenges and Optimization Strategies for Containerized Deployment of Online Services on Kubernetes

Tencent’s shift from VMs to Kubernetes for massive online services faces pod‑size rigidity, heterogeneous node balancing, elastic scaling, and massive cluster‑pool mapping, prompting optimizations such as dynamic CPU compression, custom load‑aware scheduling, collaborative HPA/VPA scaling, dynamic quota migration, unified routing‑sync, and an automated decision‑tree‑driven self‑healing workflow for container‑destruction failures.

ContainerizationDynamic SchedulingKubernetes
0 likes · 12 min read
Challenges and Optimization Strategies for Containerized Deployment of Online Services on Kubernetes
Open Source Linux
Open Source Linux
Dec 26, 2022 · Cloud Native

Why Does My Kubernetes Service Fail? 10 Common Issues and Quick Fixes

This guide walks through ten frequent Kubernetes problems—including service access failures, port mapping errors, certificate issues, pod image pull errors, init‑container hangs, and CrashLoopBackOff—explaining their causes and providing concise, step‑by‑step solutions to restore cluster functionality.

CrashLoopBackOffInitContainerKubernetes
0 likes · 7 min read
Why Does My Kubernetes Service Fail? 10 Common Issues and Quick Fixes
HelloTech
HelloTech
Dec 23, 2022 · Cloud Native

Design Principles and Implementation Details of Kubernetes Horizontal Pod Autoscaler and Custom Water Pod Autoscaler

The article explains Kubernetes’ built‑in Horizontal Pod Autoscaler, then details the custom Water Pod Autoscaler (WPA) that extends HPA with dual‑signal (load and SOA registration) detection, dual‑threshold scaling, noise filtering, configurable cooldown, frequency limits, tolerance buffers, and integrated alerting for reliable elastic scaling.

Cloud NativeHPAKubernetes
0 likes · 13 min read
Design Principles and Implementation Details of Kubernetes Horizontal Pod Autoscaler and Custom Water Pod Autoscaler
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 23, 2022 · Cloud Native

What Happens When You Deploy an App on Kubernetes? A Deep Dive

This article walks through the entire lifecycle of deploying an application on Kubernetes, explaining how Docker containers differ from virtual machines, the role of Pods, ReplicationControllers, Deployments, and how automatic scaling with HPA and VPA keeps services reliable and efficient.

Cloud NativeDeploymentHPA
0 likes · 21 min read
What Happens When You Deploy an App on Kubernetes? A Deep Dive
Code Ape Tech Column
Code Ape Tech Column
Dec 23, 2022 · Cloud Native

Overview of Popular Microservice Technology Stack and Governance Frameworks

This article presents a comprehensive overview of widely adopted microservice technology stacks, including governance frameworks like Apache Dubbo and Spring Cloud Alibaba, CI/CD tools, container orchestration, and various application services, while also offering practical selection guidance for developers and product teams.

Kubernetescloud-nativeservice governance
0 likes · 12 min read
Overview of Popular Microservice Technology Stack and Governance Frameworks
ITPUB
ITPUB
Dec 22, 2022 · Cloud Native

How 58 Tongcheng Built a Cloud‑Native Deep Learning Inference Platform with Istio

This article details the evolution of 58 Tongcheng's deep learning inference platform—from the initial WPAI‑based architecture to a cloud‑native, Istio‑powered design—covering its background, technical challenges, architectural redesign, traffic‑management features, adaptive rate limiting, model warm‑up, and observability improvements.

AI inferenceIstioKubernetes
0 likes · 24 min read
How 58 Tongcheng Built a Cloud‑Native Deep Learning Inference Platform with Istio
Ctrip Technology
Ctrip Technology
Dec 22, 2022 · Cloud Native

Evolution and Cloud‑Native Architecture of Ctrip’s Microservice Products

The article outlines Ctrip’s microservice journey from its 2013 inception, detailing the evolution of its frameworks, the complexities of operating multiple stacks, the challenges faced, and the design of a progressive cloud‑native service‑mesh architecture built on Istio, Envoy, and custom operators.

Cloud NativeDubboIstio
0 likes · 10 min read
Evolution and Cloud‑Native Architecture of Ctrip’s Microservice Products
58 Tech
58 Tech
Dec 22, 2022 · Artificial Intelligence

Implementing a Cloud-Native Istio Gateway for 58.com Deep Learning Inference Platform

This article details the evolution of 58.com’s deep learning inference platform, describing the transition from the original SCF‑based architecture to a cloud‑native Istio gateway (architecture 2.0), and explains design choices, traffic‑management, adaptive rate‑limiting, observability, model pre‑warming, and performance improvements.

AICloud NativeDeep Learning
0 likes · 22 min read
Implementing a Cloud-Native Istio Gateway for 58.com Deep Learning Inference Platform
Efficient Ops
Efficient Ops
Dec 20, 2022 · Cloud Native

Understanding Kubernetes Pods, Services, and Load Balancing Basics

This article explains Kubernetes pod architecture, networking, external exposure, and how Services use virtual IPs and selectors to provide load balancing and dynamic discovery of pod changes, including the role of kube-proxy and the limitations of using Nginx for pod-level balancing.

Cloud NativeKubernetesPods
0 likes · 8 min read
Understanding Kubernetes Pods, Services, and Load Balancing Basics
Volcano Engine Developer Services
Volcano Engine Developer Services
Dec 15, 2022 · Cloud Native

How ByteDance Scaled Cloud‑Native Infrastructure: Lessons in Multi‑Cluster Scheduling

ByteDance’s cloud‑native transformation details a layered technical system, multi‑year Kubernetes‑based evolution, unified multi‑cluster resource management, and hierarchical scheduling, illustrating how the company achieves high development speed, resource efficiency, and prepares for next‑generation serverless infrastructure.

Cloud NativeDevOpsKubernetes
0 likes · 21 min read
How ByteDance Scaled Cloud‑Native Infrastructure: Lessons in Multi‑Cluster Scheduling
Open Source Linux
Open Source Linux
Dec 15, 2022 · Cloud Native

Kubernetes 1.26 ‘Electrifying’: Key New Features, Deprecations, and Upgrades

Kubernetes 1.26, themed “Electrifying,” introduces 37 enhancements—including registry changes, storage upgrades, signed release artifacts, Windows high‑privilege containers, metric and scheduling improvements—while promoting 11 features to stable, deprecating 12 APIs, and emphasizing sustainability and carbon‑footprint awareness.

Cloud NativeKubernetesMetrics
0 likes · 10 min read
Kubernetes 1.26 ‘Electrifying’: Key New Features, Deprecations, and Upgrades
Efficient Ops
Efficient Ops
Dec 14, 2022 · Operations

How to Build a Scalable Container Log Collection System with S6 and Filebeat

This article explains Docker and Kubernetes container logging fundamentals, highlights the limitations of default json‑file logging, and presents a unified log‑collection architecture using S6‑based images, filebeat, logrotate, Kafka, and Elasticsearch, with practical steps for dynamic configuration and log rotation in a k8s cluster.

DockerFilebeatKubernetes
0 likes · 9 min read
How to Build a Scalable Container Log Collection System with S6 and Filebeat
vivo Internet Technology
vivo Internet Technology
Dec 14, 2022 · Cloud Native

Vivo’s Cloud‑Native Container Practices: High‑Availability, Automation, and Platform Evolution

Vivo’s cloud‑native journey, detailed from its 2018 machine‑learning pilot to a large‑scale container ecosystem, showcases how high‑availability design, automated multi‑cluster operations, CI/CD pipelines, and unified traffic ingress have dramatically improved efficiency, reduced costs, and enabled rapid, scalable AI‑driven services across the business.

ContainerKubernetesautomation
0 likes · 19 min read
Vivo’s Cloud‑Native Container Practices: High‑Availability, Automation, and Platform Evolution
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Dec 14, 2022 · Artificial Intelligence

How Cloud‑Native AI Boosts Resource Efficiency with PaddleFlow

This article explains how cloud‑native AI leverages container‑based architectures and advanced scheduling algorithms—such as resource queues, gang scheduling, bin‑packing, GPU topology‑aware and Tor‑aware dispatch—to improve resource and engineering efficiency, and introduces Baidu’s AI workflow engine PaddleFlow with its design, features, and deployment options.

AI workflowCloud Native AIGPU virtualization
0 likes · 25 min read
How Cloud‑Native AI Boosts Resource Efficiency with PaddleFlow
Cloud Native Technology Community
Cloud Native Technology Community
Dec 14, 2022 · Cloud Native

Kubernetes v1.26 Release: New Features, Enhancements, and Deprecations

Kubernetes 1.26 is officially released, introducing 37 enhancements—including 11 stable and 10 beta features—while deprecating 12 APIs, updating the container image registry, removing CRI v1alpha2, advancing storage CSI migrations, enhancing metrics, and adding support for Windows privileged containers and dynamic resource allocation.

CSIContainer Runtime InterfaceKubernetes
0 likes · 15 min read
Kubernetes v1.26 Release: New Features, Enhancements, and Deprecations
Architect's Guide
Architect's Guide
Dec 14, 2022 · Cloud Native

Understanding Underlay and Overlay Network Models in Kubernetes

This article explains Kubernetes networking models, detailing the underlay network infrastructure, overlay techniques, and common CNI implementations such as Flannel, Calico, IPVLAN, and VxLAN, while comparing their architectures, protocols, and configuration considerations.

CNICalicoFlannel
0 likes · 12 min read
Understanding Underlay and Overlay Network Models in Kubernetes
Efficient Ops
Efficient Ops
Dec 12, 2022 · Operations

How Bilibili Built a 5‑Year SRE Journey: High‑Availability, Multi‑Active, and Capacity Management

This article chronicles Bilibili's five‑year evolution of Site Reliability Engineering, detailing the introduction of SRE culture, the construction of high‑availability and multi‑active architectures, capacity management with Kubernetes, VPA/HPA, incident case studies, and the ongoing transformation of SRE practices across the organization.

KubernetesOperationsSRE
0 likes · 24 min read
How Bilibili Built a 5‑Year SRE Journey: High‑Availability, Multi‑Active, and Capacity Management
Alibaba Cloud Native
Alibaba Cloud Native
Dec 12, 2022 · Cloud Native

How ACK One Enables Multi‑Cluster GitOps and Unified Alert Management

ACK One is a distributed cloud‑native container platform that unifies management of Kubernetes clusters across hybrid‑cloud, edge, and on‑prem environments, offering GitOps‑based multi‑cluster application distribution with ArgoCD integration and a centralized alert‑management system.

Alert ManagementArgoCDGitOps
0 likes · 9 min read
How ACK One Enables Multi‑Cluster GitOps and Unified Alert Management
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Dec 12, 2022 · Cloud Native

How Karmada Powers Multi‑Cloud, Multi‑Cluster Production at Cloud Native Days China 2022

The Karmada community's Cloud Native Days China 2022 session in Nanjing gathered over 30 enterprises and developers to share multi‑cloud, multi‑cluster production practices, large‑scale testing results, and real‑world implementations from Huawei Cloud, vivo, Hurricane Engine, China Mobile, DaoCloud, and Zhejiang University, highlighting Karmada's scalability and ecosystem growth.

KarmadaKubernetesMulti-Cluster
0 likes · 9 min read
How Karmada Powers Multi‑Cloud, Multi‑Cluster Production at Cloud Native Days China 2022
Top Architect
Top Architect
Dec 12, 2022 · Cloud Native

Building a Container Platform at Ximalaya: Practices, Principles, and Evolution

The article chronicles Ximalaya's journey from early Docker-based Java project templates to a mature Kubernetes-driven container platform, detailing development principles, health‑check strategies, deployment workflows, middleware integration, and lessons learned about scaling, automation, and collaborative engineering.

CloudNativeContainerizationDevOps
0 likes · 13 min read
Building a Container Platform at Ximalaya: Practices, Principles, and Evolution
DevOps Cloud Academy
DevOps Cloud Academy
Dec 11, 2022 · Cloud Native

GitOps: The Missing Link for CI/CD on Kubernetes

GitOps leverages Git as an immutable source of truth to streamline CI/CD pipelines for Kubernetes, enhancing productivity, security, and compliance by providing observable, auditable deployments, centralized control, and easy rollbacks, while requiring dedicated tools such as Flux or Weave GitOps Core for full implementation.

Cloud NativeDevOpsFlux
0 likes · 12 min read
GitOps: The Missing Link for CI/CD on Kubernetes
Architect's Guide
Architect's Guide
Dec 11, 2022 · Cloud Native

The Journey of Containerization at Ximalaya: Practices, Principles, and Lessons Learned

This article recounts Ximalaya's multi‑year containerization effort, detailing the evolution from early Docker templates and Marathon to Kubernetes, the development of internal tools like barge and k8s‑sync, health‑check strategies, deployment patterns, and the practical lessons gained from integrating containers with existing middleware.

Cloud NativeContainerizationDevOps
0 likes · 12 min read
The Journey of Containerization at Ximalaya: Practices, Principles, and Lessons Learned
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Dec 10, 2022 · Cloud Native

Why Kubernetes Pods Fail with “Too Many Open Files” and How to Fix It

The article explains the “Too many open files” error in Kubernetes, clarifies that it refers to exceeding system file‑handle limits, shows how to inspect current usage with ulimit and lsof, and provides step‑by‑step commands to temporarily or permanently raise the limits and troubleshoot the application code.

DevOpsKubernetesToo many open files
0 likes · 5 min read
Why Kubernetes Pods Fail with “Too Many Open Files” and How to Fix It
Tencent Cloud Developer
Tencent Cloud Developer
Dec 7, 2022 · Cloud Native

Kubernetes Architecture Analysis: Design Patterns, Principles and Implementation

The article examines Kubernetes architecture from a software‑design viewpoint, showing how its declarative API and extensible ecosystem outpace Swarm and Mesos, and detailing core concepts, control‑plane components, identified design patterns such as microkernel, event‑driven and CQRS, key architectural decisions, and the resulting strengths and trade‑offs.

Control PlaneEvent-drivenK8s Architecture
0 likes · 13 min read
Kubernetes Architecture Analysis: Design Patterns, Principles and Implementation
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Dec 7, 2022 · Cloud Native

How to Scale Kubernetes to 5,000 Nodes: Master, API Server, and Component Tuning

This guide explains how to push a Kubernetes cluster toward its theoretical limit of 5,000 nodes by detailing official limits, master node sizing for GCE and AWS, kube‑apiserver high‑availability and connection‑count tuning, scheduler and controller‑manager leader election settings, kubelet optimizations, and DNS anti‑affinity configuration.

Cloud NativeKubernetesOperations
0 likes · 6 min read
How to Scale Kubernetes to 5,000 Nodes: Master, API Server, and Component Tuning
Architect's Guide
Architect's Guide
Dec 5, 2022 · Cloud Native

Step-by-Step Guide to Deploying a High‑Availability Kubernetes Cluster with NFS, Ingress, Dashboard, and Harbor

This comprehensive tutorial walks through preparing the operating system, installing Docker and containerd, configuring yum repositories, initializing a multi‑master HA Kubernetes cluster with IPVS, deploying the Kubernetes dashboard, setting up NFS storage, installing an Ingress controller, and finally installing Harbor with Helm and a custom NFS provisioner, providing all necessary commands and configuration files.

DockerHAHarbor
0 likes · 38 min read
Step-by-Step Guide to Deploying a High‑Availability Kubernetes Cluster with NFS, Ingress, Dashboard, and Harbor
ITPUB
ITPUB
Dec 4, 2022 · Cloud Native

How Qunar Scaled Container Monitoring with VictoriaMetrics: A Cloud‑Native Case Study

This article details Qunar's migration from a Prometheus‑based monitoring stack to VictoriaMetrics, describing the limitations they faced, the architectural redesign using vmagent, vmcluster, and vmalert, and the resulting performance improvements and operational benefits for large‑scale Kubernetes environments.

Cloud NativeKubernetesPrometheus
0 likes · 14 min read
How Qunar Scaled Container Monitoring with VictoriaMetrics: A Cloud‑Native Case Study
Efficient Ops
Efficient Ops
Dec 1, 2022 · Operations

Why Choose Loki Over ELK? A Hands‑On Guide to Deploying and Using Grafana Loki

This article explains the motivations for selecting Grafana Loki instead of ELK/EFK, introduces its core concepts and features, provides step‑by‑step deployment instructions for Promtail and Loki, and demonstrates how to configure Grafana, query logs, and handle label indexing, dynamic tags, and high‑cardinality challenges.

GrafanaKubernetesLoki
0 likes · 15 min read
Why Choose Loki Over ELK? A Hands‑On Guide to Deploying and Using Grafana Loki
Cloud Native Technology Community
Cloud Native Technology Community
Dec 1, 2022 · Cloud Native

Integrating OpenStack and Kubernetes Networks with Kube-OVN: Cluster Interconnect and Shared OVN Modes

This guide explains how to use Kube-OVN to connect OpenStack virtual machines and Kubernetes containers by configuring cluster interconnect or shared OVN modes, covering prerequisites, OVN‑IC database deployment, Kubernetes and OpenStack side settings, and example manifests for creating Pods in OpenStack subnets.

Kube-OVNKubernetesOVN
0 likes · 11 min read
Integrating OpenStack and Kubernetes Networks with Kube-OVN: Cluster Interconnect and Shared OVN Modes
Huolala Tech
Huolala Tech
Dec 1, 2022 · Cloud Computing

How to Master Spot Instances for Cost‑Effective Cloud Scaling

This article explains what Spot (preemptible) instances are, compares them with on‑demand and reserved instances, details AWS Spot pricing and signals, and provides practical strategies—including node‑group design, Kubernetes scheduling, health checks, and rollback plans—to reliably reduce cloud costs while maintaining application availability.

AWSCost OptimizationKubernetes
0 likes · 22 min read
How to Master Spot Instances for Cost‑Effective Cloud Scaling
Efficient Ops
Efficient Ops
Nov 30, 2022 · Cloud Native

How kubectl top Retrieves Real‑Time Metrics in Kubernetes: A Deep Dive

This article explains how the kubectl top command gathers real‑time CPU and memory usage for nodes and pods, details the underlying data flow and metric API implementation in Kubernetes, compares heapster and metrics‑server, and addresses common troubleshooting scenarios.

HeapsterKubernetescAdvisor
0 likes · 15 min read
How kubectl top Retrieves Real‑Time Metrics in Kubernetes: A Deep Dive