Tagged articles
3116 articles
Page 6 of 32
ITPUB
ITPUB
Jan 18, 2025 · Cloud Native

Prometheus 3.0 Unveiled: New UI, Remote‑Write 2.0, and Native Histograms

Prometheus 3.0, the first major release in seven years, introduces a rebuilt UI, Remote‑Write 2.0 with richer metadata, full UTF‑8 support, native OpenTelemetry ingestion, experimental native histograms, performance gains, and a set of breaking changes that require careful migration.

Cloud NativeNative HistogramsPrometheus
0 likes · 8 min read
Prometheus 3.0 Unveiled: New UI, Remote‑Write 2.0, and Native Histograms
IT Architects Alliance
IT Architects Alliance
Jan 16, 2025 · Cloud Native

How Microservices and Serverless Combine to Transform Modern Applications

Microservices break monoliths into focused services, while serverless offloads infrastructure management to cloud providers; together they boost agility, scalability, cost efficiency, and security, as illustrated by real-world cases from ride‑hailing and e‑commerce, and the article outlines adoption challenges and future opportunities.

Cloud NativeCost OptimizationMicroservices
0 likes · 9 min read
How Microservices and Serverless Combine to Transform Modern Applications
JD Tech Talk
JD Tech Talk
Jan 16, 2025 · Artificial Intelligence

JD Retail Technology 2024 Innovations: AI-Driven Platforms, Data Lake, Cross‑Platform Development, and Intelligent Supply Chain

In 2024 JD Retail Technology showcased a suite of innovations—including a major JD APP redesign, data‑driven inventory and allocation algorithms, an AIGC content platform, a low‑code national‑subsidy system, a large‑scale data lake, AI‑powered merchant assistants, cross‑platform Taro on Harmony, advanced advertising creative generation, immersive XR shopping experiences, and a domestic‑chip AI engine—demonstrating how AI, big data, and modern development frameworks drive faster fulfillment, richer user experiences, and operational efficiency.

Big DataCloud Nativeproduct-management
0 likes · 15 min read
JD Retail Technology 2024 Innovations: AI-Driven Platforms, Data Lake, Cross‑Platform Development, and Intelligent Supply Chain
Alibaba Cloud Native
Alibaba Cloud Native
Jan 14, 2025 · Cloud Native

Unlocking Kubernetes IO Insights with ACK’s New Storage Monitoring Dashboards

This article explains how Alibaba Cloud Container Service for Kubernetes (ACK) has upgraded its storage monitoring dashboards to provide detailed visibility into local, PVC, and cloud‑based volumes, enabling users to detect IO bottlenecks, track real‑time read/write performance, and improve overall container reliability.

ACKCloud NativeDashboard
0 likes · 8 min read
Unlocking Kubernetes IO Insights with ACK’s New Storage Monitoring Dashboards
Baidu Geek Talk
Baidu Geek Talk
Jan 13, 2025 · Industry Insights

Top 12 Must-Read Baidu Tech Articles of 2024: Insights & Innovations

This roundup highlights twelve standout Baidu Geek articles from 2024, covering breakthroughs in search personalization, high‑performance Go services, transaction reconciliation, login system evolution, AI‑native applications, microservice governance, caching algorithms, RLHF optimization, ClickHouse deployment, and more, each with concise recommendation reasons.

2024AIBackend
0 likes · 8 min read
Top 12 Must-Read Baidu Tech Articles of 2024: Insights & Innovations
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 13, 2025 · Cloud Native

Extending Alibaba Cloud Service Mesh (ASM): EnvoyFilter, Lua, Wasm, External Processing, and Custom Authorization Services

This article explains how Alibaba Cloud Service Mesh (ASM) can be extended using EnvoyFilter, Lua scripts, WebAssembly plugins, External Processing filters, and custom authorization services, detailing their capabilities, limitations, and recommended use cases for cloud‑native microservice environments.

ASMCloud NativeEnvoy
0 likes · 11 min read
Extending Alibaba Cloud Service Mesh (ASM): EnvoyFilter, Lua, Wasm, External Processing, and Custom Authorization Services
IT Architects Alliance
IT Architects Alliance
Jan 12, 2025 · Cloud Native

Unlocking Cloud‑Native Success: Microservices, Containers & DevOps Explained

The article explores how cloud‑native architecture—driven by microservices, containerization, and DevOps—empowers enterprises to achieve greater agility, scalability, and operational efficiency, detailing core principles, real‑world examples such as Netflix, common challenges, and practical tools for implementation and security.

Cloud NativeContainersDevOps
0 likes · 18 min read
Unlocking Cloud‑Native Success: Microservices, Containers & DevOps Explained
Tencent Cloud Developer
Tencent Cloud Developer
Jan 7, 2025 · Operations

Designing High‑Availability Systems: Principles, Architecture, and Operations

This comprehensive guide explains how to design, build, and operate high‑availability systems by covering availability metrics, fault‑tolerance strategies, capacity planning, code and data layer architecture, automated testing, monitoring, and clear role responsibilities to ensure services stay reliable and resilient under load.

Cloud NativeSRESystem Design
0 likes · 32 min read
Designing High‑Availability Systems: Principles, Architecture, and Operations
IT Architects Alliance
IT Architects Alliance
Jan 6, 2025 · Cloud Native

Mastering Service Discovery and Dynamic Scaling in Cloud‑Native Architectures

This article explains how distributed systems transition from monolithic to micro‑service architectures, detailing the role of registries, service registration methods, discovery mechanisms, and both horizontal and vertical scaling strategies, with practical examples and guidance for technology selection and future trends.

Cloud NativeDynamic ScalingKubernetes
0 likes · 21 min read
Mastering Service Discovery and Dynamic Scaling in Cloud‑Native Architectures
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 6, 2025 · Cloud Native

How Fluid Enables Seamless Dynamic Dataset Mounting for Cloud‑Native AI Development

PAI‑DSW leverages the Fluid project to provide a cloud‑native AI development platform where data scientists can dynamically mount and unmount OSS datasets on running Kubernetes pods without restarting, improving workflow efficiency and addressing the challenges of heterogeneous data source management in AI engineering.

AI DevelopmentCloud NativeFluid
0 likes · 18 min read
How Fluid Enables Seamless Dynamic Dataset Mounting for Cloud‑Native AI Development
Alibaba Cloud Native
Alibaba Cloud Native
Jan 3, 2025 · Cloud Native

How We Unified a Complex Multi‑Gateway Architecture with Higress and Istio CRD

Facing thousands of heterogeneous gateway configurations across multiple tech stacks, a Chinese cloud‑native platform consolidated its gateway layer by adopting Higress, Istio CRD, and APISIX, introducing a two‑tier rule model and automated migration tools that cut maintenance effort by 90% while preserving service continuity.

APISIXCloud NativeConfiguration Management
0 likes · 14 min read
How We Unified a Complex Multi‑Gateway Architecture with Higress and Istio CRD
Ctrip Technology
Ctrip Technology
Jan 3, 2025 · Big Data

Design and Implementation of a Kafka Gatekeeper for FinOps Billing Data Quality Governance

This article describes the challenges of data quality in Ctrip’s hybrid‑cloud FinOps billing system and presents the design, implementation, and high‑availability deployment of a custom Kafka Gatekeeper proxy that performs pre‑validation, configurable rules, self‑service dashboards, and automated alerts to improve coverage, timeliness, and responsibility attribution.

Big DataCloud NativeData Quality
0 likes · 17 min read
Design and Implementation of a Kafka Gatekeeper for FinOps Billing Data Quality Governance
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 1, 2025 · Industry Insights

How Cloud‑Native Is Reshaping China’s Game Industry and What Elastic Strategies Developers Need

The article analyzes the rapid growth of China's game cloud market, explains why cloud‑native adoption has become industry‑wide, and details practical application‑layer and resource‑layer elasticity strategies—including OpenKruiseGame, state‑aware scaling, and Alibaba Cloud node‑scaling options—to improve performance and reduce costs.

Alibaba CloudCloud NativeGame Development
0 likes · 14 min read
How Cloud‑Native Is Reshaping China’s Game Industry and What Elastic Strategies Developers Need
Zhihu Tech Column
Zhihu Tech Column
Dec 31, 2024 · Cloud Native

Cloud Native Innovation Forum: AutoMQ Table Topic, OceanBase Integrated Database, and Observability Practices

The article recaps Zhihu's Cloud Native Innovation Forum where experts from AutoMQ, OceanBase, and Flashcat shared practical solutions on streaming data ingestion, unified database architectures, and AI‑driven observability, highlighting real‑world deployments, performance optimizations, and cost‑saving strategies.

AIAutoMQCloud Native
0 likes · 10 min read
Cloud Native Innovation Forum: AutoMQ Table Topic, OceanBase Integrated Database, and Observability Practices
dbaplus Community
dbaplus Community
Dec 30, 2024 · Cloud Native

What’s New in Kubernetes v1.32? A Deep Dive into 44 Feature Enhancements

Kubernetes v1.32 introduces 44 enhancements—including 13 stable, 12 beta, and 19 alpha features—spanning dynamic resource allocation, Windows node support, improved kubelet reliability, new API endpoints, and extensive updates to DRA, pod‑level resources, and scheduling, all aimed at strengthening the cloud‑native ecosystem.

Cloud NativeDRAFeature Enhancements
0 likes · 16 min read
What’s New in Kubernetes v1.32? A Deep Dive into 44 Feature Enhancements
Alibaba Cloud Observability
Alibaba Cloud Observability
Dec 30, 2024 · Cloud Native

What Caused OpenAI’s Global Outage? Lessons for Cloud‑Native Observability

The article analyzes the December 11 OpenAI outage, revealing that a newly deployed telemetry service overloaded Kubernetes API servers, breaking DNS resolution and slowing recovery, and compares OpenAI’s approach with LoongCollector/iLogtail’s design to offer stability insights for cloud‑native environments.

API ServerCloud NativeKubernetes
0 likes · 15 min read
What Caused OpenAI’s Global Outage? Lessons for Cloud‑Native Observability
Alibaba Cloud Native
Alibaba Cloud Native
Dec 28, 2024 · Cloud Native

How ACK One Multi‑Cluster Gateway Enables Seamless Cross‑AZ and Multi‑Region Disaster Recovery

This article explains how Alibaba Cloud's ACK One multi‑cluster gateway provides active‑active disaster recovery across same‑city AZs, hybrid‑cloud environments, and distant regions, detailing the architecture, setup steps, advantages over DNS‑based solutions, and practical considerations for enterprise workloads.

ACK OneCloud Nativecross-AZ
0 likes · 13 min read
How ACK One Multi‑Cluster Gateway Enables Seamless Cross‑AZ and Multi‑Region Disaster Recovery
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 27, 2024 · Cloud Native

ElasticWorkload, WorkloadSpread, UnitedDeployment, and ResourcePolicy: Configurable Plugins for Serverless Elasticity in Alibaba Cloud Container Service

This article explains how Serverless elasticity is achieved in Alibaba Cloud Container Service by introducing four configurable plugins—ElasticWorkload, WorkloadSpread, UnitedDeployment, and ResourcePolicy—detailing their core capabilities, technical principles, advantages, real‑world use cases, and guidance for selecting the appropriate solution.

Cloud NativeElasticWorkloadKubernetes
0 likes · 30 min read
ElasticWorkload, WorkloadSpread, UnitedDeployment, and ResourcePolicy: Configurable Plugins for Serverless Elasticity in Alibaba Cloud Container Service
IT Architects Alliance
IT Architects Alliance
Dec 26, 2024 · Cloud Native

How Cloud‑Native, Microservices, Containers and DevOps Drive Digital Transformation

Cloud‑native architecture, built on microservices, containers, and DevOps, empowers enterprises with agility, scalability, and resilience, enabling rapid development, efficient resource utilization, and seamless continuous delivery, while addressing challenges like distributed transactions and service governance, and outlining future integration with 5G, edge computing, and AI.

5GCloud NativeContainers
0 likes · 15 min read
How Cloud‑Native, Microservices, Containers and DevOps Drive Digital Transformation
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 26, 2024 · Cloud Native

How a New Telemetry Service Overwhelmed OpenAI’s Kubernetes API Server

An in‑depth post‑mortem reveals how OpenAI’s newly deployed telemetry service generated massive Kubernetes API requests, overloading the API server, breaking DNS resolution, and slowing recovery, while contrasting OpenAI’s approach with LoongCollector/iLogtail’s design to minimize API load and improve cluster stability.

API ServerCloud NativeCluster Reliability
0 likes · 15 min read
How a New Telemetry Service Overwhelmed OpenAI’s Kubernetes API Server
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 25, 2024 · Cloud Native

Ensuring Stability of Large‑Scale Kubernetes Clusters: Lessons from the OpenAI Incident and Alibaba Cloud Practices

This article analyses the OpenAI large‑scale Kubernetes outage, explains the inherent risks of massive K8s clusters, and presents Alibaba Cloud's architectural enhancements, observability improvements, and best‑practice guidelines to achieve high‑availability and reliable operation of thousands‑node Kubernetes environments.

Cloud NativeKubernetesLarge-Scale Clusters
0 likes · 21 min read
Ensuring Stability of Large‑Scale Kubernetes Clusters: Lessons from the OpenAI Incident and Alibaba Cloud Practices
IT Architects Alliance
IT Architects Alliance
Dec 24, 2024 · Cloud Native

Unlock Scalable, Highly Available IT Architecture: Key Strategies Explained

This article examines the modern challenges of IT architecture and presents proven techniques—microservices, container orchestration, distributed caching, redundancy, load balancing, and automated fault recovery—illustrated with Amazon and Google case studies, while forecasting future AI and cloud‑native trends.

Cloud NativeMicroservicesScalability
0 likes · 10 min read
Unlock Scalable, Highly Available IT Architecture: Key Strategies Explained
Alibaba Cloud Observability
Alibaba Cloud Observability
Dec 24, 2024 · Operations

How to Achieve Full Observability for Go Apps Without Intrusive Agents

This article compares three Go observability solutions—SDK instrumentation, eBPF‑based monitoring, and compile‑time code injection—explaining their mechanisms, open‑source implementations, trade‑offs, and why Alibaba Cloud's Instgo compile‑time approach offers a low‑overhead, non‑intrusive APM alternative.

Cloud NativeGoInstrumentation
0 likes · 11 min read
How to Achieve Full Observability for Go Apps Without Intrusive Agents
System Architect Go
System Architect Go
Dec 23, 2024 · Cloud Native

Mastering Kubernetes API Server Flow Control: APF Explained

This article explains how Kubernetes' API Priority and Fairness (APF) mechanism enhances kube‑apiserver traffic control by introducing FlowSchema and PriorityLevelConfiguration objects, allowing fine‑grained request prioritization, concurrency limits, and queue management beyond the basic inflight throttling flags.

APFAPI ServerCloud Native
0 likes · 7 min read
Mastering Kubernetes API Server Flow Control: APF Explained
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 21, 2024 · Cloud Native

Understanding Docker and Kubernetes: Principles, Architecture, and Deployment Practices

This article explains the fundamentals of containerization by reviewing virtualization concepts, detailing Docker's architecture and Dockerfile syntax, and then introduces Kubernetes' control‑plane and node components, providing step‑by‑step examples for deploying a simple Nginx service and a Java web application on a K8s cluster, both manually and with automation tools.

Cloud NativeDeploymentDevOps
0 likes · 19 min read
Understanding Docker and Kubernetes: Principles, Architecture, and Deployment Practices
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 19, 2024 · Artificial Intelligence

How to Build a Full-Stack RAG Knowledge QA App with Alibaba Cloud Low-Code Platform

This guide walks you through creating a complete retrieval‑augmented generation (RAG) knowledge‑question‑answer system on Alibaba Cloud, covering AI model integration, cloud‑native low‑code development, database setup, UI customization, session persistence, analytics dashboards, and multi‑channel deployment.

AIChatbotCloud Native
0 likes · 22 min read
How to Build a Full-Stack RAG Knowledge QA App with Alibaba Cloud Low-Code Platform
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 17, 2024 · Cloud Native

Recap of Kubernetes Community Day 2024 Jakarta: Generative AI, eRDMA, Container Security, and Observability

The Kubernetes Community Day held in Jakarta on November 30, 2024 featured Alibaba Cloud experts presenting best‑practice sessions on scaling generative AI workloads, eRDMA network acceleration, container image security, and OpenTelemetry‑based observability within the ACK Kubernetes platform.

Cloud NativeContainer SecurityKubernetes
0 likes · 6 min read
Recap of Kubernetes Community Day 2024 Jakarta: Generative AI, eRDMA, Container Security, and Observability
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 12, 2024 · Cloud Native

How to Use Nacos Custom Tag Gray Release for Precise Configuration Deployment

This article explains Nacos's custom tag gray release feature, covering its advantages over IP‑based gray releases, version requirements, tag configuration methods, publishing steps, multi‑tag and parallel gray versions, and priority rules to achieve flexible, safe configuration rollout in cloud‑native environments.

Cloud NativeConfiguration ManagementCustom Tags
0 likes · 16 min read
How to Use Nacos Custom Tag Gray Release for Precise Configuration Deployment
Alibaba Cloud Observability
Alibaba Cloud Observability
Dec 9, 2024 · Operations

How to Integrate Alibaba Cloud RUM SDK for HarmonyOS Native Apps

This guide explains the background of HarmonyOS NEXT, introduces Alibaba Cloud ARMS Real User Monitoring SDK for native HarmonyOS apps, details its page, resource, exception, and custom data collection features, and provides step‑by‑step integration instructions with code examples.

Cloud NativeHarmonyOSMobile Development
0 likes · 8 min read
How to Integrate Alibaba Cloud RUM SDK for HarmonyOS Native Apps
Alibaba Cloud Observability
Alibaba Cloud Observability
Dec 9, 2024 · Cloud Native

How to Design and Use Cloud Monitoring Event Subscriptions on Alibaba Cloud

This guide explains the purpose, design, and step‑by‑step configuration of Alibaba Cloud's cloud‑monitor event subscription feature, covering typical multi‑team and application‑group scenarios, flexible filtering, aggregation, custom notifications, and integration with external services for robust cloud‑native operations.

Alibaba CloudCloud NativeEvent Subscription
0 likes · 10 min read
How to Design and Use Cloud Monitoring Event Subscriptions on Alibaba Cloud
Efficient Ops
Efficient Ops
Dec 8, 2024 · Operations

Unlocking BizDevOps: Key Insights from Shanghai’s Enterprise Summit

The article recaps Shanghai’s BizDevOps Enterprise Summit, highlighting five expert sessions on R&D‑operations integration in securities, platform engineering breakthroughs, large‑model agents in financial ops, Ctrip’s 10 PB JuiceFS practice, and core SRE stability strategies for financial firms.

AI agentsBizDevOpsCloud Native
0 likes · 4 min read
Unlocking BizDevOps: Key Insights from Shanghai’s Enterprise Summit
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Dec 6, 2024 · Cloud Native

How Terway Implements Elastic ENI and Advanced CNI Designs for Kubernetes

This article explains the fundamentals of Container Network Interface (CNI), outlines common plugin implementations, and dives deep into Alibaba Cloud's Terway solution, detailing its elastic ENI support, Vethpair and ipvlan modes, resource management, scheduling, network policy integration, and future enhancements such as eBPF.

CNICloud NativeENI
0 likes · 13 min read
How Terway Implements Elastic ENI and Advanced CNI Designs for Kubernetes
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Dec 5, 2024 · Big Data

Interview with Jianchen: Journey from Open Source Contributor to Data Engineer at Xiaohongshu

In this interview, Xiaohongshu data engineer Jianchen recounts his evolution from a computer‑science student discovering open‑source through MIT6.824 to contributing to SOFAJRaft and Apache RocketMQ, detailing his OSPP projects, the decision to join Xiaohongshu, and his work on a cloud‑native Kafka engine that cut storage and compute usage by half.

Apache RocketMQBig DataCareer Development
0 likes · 11 min read
Interview with Jianchen: Journey from Open Source Contributor to Data Engineer at Xiaohongshu
Code Mala Tang
Code Mala Tang
Dec 4, 2024 · Cloud Native

7 Proven Dockerfile Tricks to Shrink Images and Speed Up Builds

Learn seven practical Dockerfile optimization techniques—from picking lightweight base images and reducing layers to leveraging cache, .dockerignore, environment variables, multi-stage builds, and locking dependency versions—to create smaller, faster, and more reliable container images.

Cloud NativeContainer OptimizationDevOps
0 likes · 6 min read
7 Proven Dockerfile Tricks to Shrink Images and Speed Up Builds
Java Architecture Diary
Java Architecture Diary
Dec 4, 2024 · Cloud Native

Spring Cloud 2024.0.0 (Moorgate) Release: New Features, Quick Start & Maven Setup

Spring Cloud 2024.0.0 (Moorgate) has been released, built on Spring Boot 3.4.0, introducing enhancements across Gateway, CircuitBreaker, OpenFeign, Commons, Config, and Kubernetes modules, along with performance improvements, new configuration options, and a quick-start guide showing Maven dependency management for the updated platform.

Cloud NativeSpring Boot 3.4Spring Cloud
0 likes · 5 min read
Spring Cloud 2024.0.0 (Moorgate) Release: New Features, Quick Start & Maven Setup
dbaplus Community
dbaplus Community
Nov 28, 2024 · Cloud Native

Can Redis Thrive on Kubernetes? Insights from Kuaishou’s Cloud‑Native Journey

Drawing on Kuaishou’s experience, this article examines whether stateful services like Redis belong on Kubernetes, outlines the benefits and risks, and details a cloud‑native solution using custom workloads, KubeBlocks, and a federated cluster architecture to achieve scalable, reliable Redis deployments.

Cloud NativeFederated ClustersKubeBlocks
0 likes · 15 min read
Can Redis Thrive on Kubernetes? Insights from Kuaishou’s Cloud‑Native Journey
Architecture & Thinking
Architecture & Thinking
Nov 28, 2024 · Cloud Native

How to Scale Istio Across Hundreds of Services: Real‑World Strategies & Performance Insights

This article shares practical guidance on rolling out Istio service mesh to over ten business lines, covering selection of pilot projects, benefit analysis using access logs, sidecar injection, performance and resource impact, multi‑region active‑active architecture benefits, and rapid fault‑recovery tactics.

Cloud NativeIstioMicroservices
0 likes · 9 min read
How to Scale Istio Across Hundreds of Services: Real‑World Strategies & Performance Insights
High Availability Architecture
High Availability Architecture
Nov 27, 2024 · Cloud Native

Apache Dubbo Triple X Protocol Adds Full HTTP/3 Support: Design, Configuration, and Performance

The article explains how Apache Dubbo's Triple X protocol now fully supports HTTP/3, detailing its design goals, performance advantages, configuration steps, code examples, and real‑world benchmarks that demonstrate significant latency reduction and reliability improvements in cloud‑native microservice environments.

Cloud NativeDubboHTTP/3
0 likes · 9 min read
Apache Dubbo Triple X Protocol Adds Full HTTP/3 Support: Design, Configuration, and Performance
Sanyou's Java Diary
Sanyou's Java Diary
Nov 25, 2024 · Cloud Native

Designing Resilient Stateful Distributed Systems: From Theory to Microservice Architecture

This article explores the fundamentals of distributed systems, compares stateful and stateless services, examines monolithic, SOA, and microservice models, and provides practical guidance on access layers, fault tolerance, service discovery, scaling, and data storage for building robust cloud‑native architectures.

Cloud NativeMicroservicesScalability
0 likes · 29 min read
Designing Resilient Stateful Distributed Systems: From Theory to Microservice Architecture
Alibaba Cloud Observability
Alibaba Cloud Observability
Nov 22, 2024 · Cloud Native

Mastering Alibaba Cloud Observability: Tagging Strategies for Efficient Resource Management

This article explains how Alibaba Cloud’s observability suite uses tag metadata to organize, monitor, and secure resources across business, endpoints, applications, middleware, and containers, offering best‑practice design principles and real‑world case studies for building scalable, tag‑driven monitoring dashboards.

Alibaba CloudCloud NativeTag Management
0 likes · 25 min read
Mastering Alibaba Cloud Observability: Tagging Strategies for Efficient Resource Management
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 18, 2024 · Cloud Native

Alibaba Cloud ACK Backup Center: Kubernetes Disaster Recovery and Migration with Resource Adjustment Strategies

This article explains how Alibaba Cloud ACK Backup Center simplifies Kubernetes disaster recovery and cross‑cluster migration by offering automated resource‑adjustment policies, detailed backup and restore workflows, and a step‑by‑step best‑practice example for migrating a stateful application with custom YAML configurations.

ACKCloud NativeKubernetes
0 likes · 10 min read
Alibaba Cloud ACK Backup Center: Kubernetes Disaster Recovery and Migration with Resource Adjustment Strategies
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 18, 2024 · Cloud Native

Developing a Custom Kubernetes Controller for Flink Task Scheduling

This article provides a step‑by‑step guide to building a custom Kubernetes controller in Go that uses Prometheus metrics to intelligently schedule Flink TaskManager Pods, covering the underlying scheduler concepts, code implementation, Docker image creation, RBAC setup, deployment, testing, and advanced considerations.

Cloud NativeCustom SchedulerFlink
0 likes · 38 min read
Developing a Custom Kubernetes Controller for Flink Task Scheduling
DataFunSummit
DataFunSummit
Nov 16, 2024 · Big Data

Data Lake Storage Acceleration: Evolution, Challenges, and Solutions for AI and Big Data Workloads

This article surveys the evolution of data‑lake storage acceleration, compares different architectural stages, analyzes why acceleration is needed for AI and big‑data scenarios, and details the key techniques—metadata acceleration, read/write speedup, and end‑to‑end workflow optimization—used to overcome performance and cost challenges.

AICloud Nativecaching
0 likes · 23 min read
Data Lake Storage Acceleration: Evolution, Challenges, and Solutions for AI and Big Data Workloads
Top Architect
Top Architect
Nov 16, 2024 · Cloud Native

Why Docker May Not Be Suitable for Running MySQL: Data Security, Performance, State, and Resource Isolation Issues

The article examines why deploying MySQL in Docker containers can be problematic, highlighting data‑security risks, performance bottlenecks, state‑management challenges, and limited resource isolation, while also noting specific scenarios where containerizing MySQL might still be viable.

Cloud NativeDockercontainerization
0 likes · 8 min read
Why Docker May Not Be Suitable for Running MySQL: Data Security, Performance, State, and Resource Isolation Issues
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Nov 15, 2024 · Artificial Intelligence

How PaaS for AI Optimizes Large‑Model Workloads on Kubernetes

This article analyzes the three core technologies behind PaaS for AI—GPU resource management, node data optimization, and task scheduling—detailing their concepts, component architecture, critical workflows, technical advantages, and future challenges, while illustrating practical configurations with Kubernetes and Volcano examples.

AIBig DataCloud Native
0 likes · 16 min read
How PaaS for AI Optimizes Large‑Model Workloads on Kubernetes
Architecture & Thinking
Architecture & Thinking
Nov 15, 2024 · Databases

How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets

This article explains how Baidu’s TDE‑ClickHouse, as a core engine of the Turing 3.0 ecosystem, overcomes platform fragmentation, quality issues, and usability challenges through the OneData+ development paradigm, multi‑level aggregation, projection, query‑caching, bulk‑load ingestion, and a cloud‑native architecture to achieve sub‑second query response for massive data volumes.

Big DataCloud NativeDistributed Systems
0 likes · 22 min read
How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets
Cognitive Technology Team
Cognitive Technology Team
Nov 14, 2024 · Operations

Designing Self‑Healing Applications for Fault Tolerance in Distributed Systems

To ensure distributed applications can recover automatically from hardware, network, or service failures, this guide outlines three core capabilities—fault detection, graceful handling, and monitoring—plus practical strategies such as asynchronous component separation, retries, circuit breakers, isolation, load shedding, failover, compensation, checkpointing, graceful degradation, rate limiting, leader election, fault injection, chaos engineering, and use of availability zones.

Cloud NativeDistributed SystemsOperations
0 likes · 7 min read
Designing Self‑Healing Applications for Fault Tolerance in Distributed Systems
Alibaba Cloud Observability
Alibaba Cloud Observability
Nov 13, 2024 · Cloud Native

Can iLogtail Replace Logstash? Exploring Performance and Ops Challenges

This article examines the traditional ELK stack, highlights iLogtail's performance advantages over Filebeat and Logstash, analyzes why iLogtail could not previously replace them, and details the five key engineering solutions—ranging from plugin optimization to Config Server disaster recovery—that enable iLogtail to serve as a full‑stack log collection platform in cloud‑native environments.

Cloud NativeELKFilebeat
0 likes · 13 min read
Can iLogtail Replace Logstash? Exploring Performance and Ops Challenges
Baidu Geek Talk
Baidu Geek Talk
Nov 13, 2024 · Industry Insights

Why Cloud‑Native Data Lakes Are the New Standard for Storage Acceleration

This article analyzes the evolution of data‑lake storage acceleration, compares traditional parallel file systems, object‑storage‑based solutions and modern cache‑enabled architectures, and explains how cloud‑native data lakes address scalability, cost, and performance challenges for AI and big‑data workloads.

AIBig DataCloud Native
0 likes · 24 min read
Why Cloud‑Native Data Lakes Are the New Standard for Storage Acceleration
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 8, 2024 · Industry Insights

Unlocking Efficient LLM Inference: Insights from China’s Cloud Computing Conference

The 5th China Cloud Computing Infrastructure Developer Conference in Beijing highlighted cutting‑edge AI inference optimization, Knative‑based serverless acceleration, AMD PMU virtualization, and CDI‑driven GPU management, offering detailed technical insights and real‑world case studies that illustrate how cloud providers are tackling performance and cost challenges of modern workloads.

AI inferenceAMD virtualizationCloud Native
0 likes · 9 min read
Unlocking Efficient LLM Inference: Insights from China’s Cloud Computing Conference
Alibaba Cloud Observability
Alibaba Cloud Observability
Nov 8, 2024 · Cloud Native

Enable Python Probe for LLM Observability on Alibaba Cloud ACK

This guide explains how to integrate Alibaba Cloud's Python probe into a Kubernetes (ACK) environment to monitor large language model (LLM) applications, covering prerequisites, installation steps, Dockerfile modifications, resource permissions, and sample Python code for both server and client components.

ARMSCloud NativeDocker
0 likes · 16 min read
Enable Python Probe for LLM Observability on Alibaba Cloud ACK
Alibaba Cloud Observability
Alibaba Cloud Observability
Nov 8, 2024 · Cloud Native

How GraalVM Static Compilation Boosts Cloud‑Native Java Performance and Observability

This article explains the challenges of Java cold start and high memory usage in cloud‑native environments, introduces GraalVM static compilation and a novel static Java Agent solution, and provides step‑by‑step instructions for installing ARMS, configuring dependencies, and achieving fast, observable native images.

Cloud NativeJava Agentgraalvm
0 likes · 16 min read
How GraalVM Static Compilation Boosts Cloud‑Native Java Performance and Observability
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 7, 2024 · Cloud Native

Deep Dive into the New Features of Argo Workflows 3.6

This article provides a comprehensive analysis of Argo Workflows 3.6, covering its enhanced scheduling, UI improvements, controller stability and security upgrades, OSS artifact garbage collection, dynamic template references, expanded expression library, and CLI usability, along with practical YAML examples for each feature.

Argo WorkflowsCloud NativeKubernetes
0 likes · 12 min read
Deep Dive into the New Features of Argo Workflows 3.6
Cloud Native Technology Community
Cloud Native Technology Community
Nov 7, 2024 · Cloud Native

Top Microservices Trends Shaping 2025: Edge, Serverless, AI & More

Microservices are evolving toward 2025 with trends such as edge computing, container orchestration via Kubernetes, DevSecOps, serverless functions, AI-driven management, advanced observability, API gateways, service meshes, multi-language services, event-driven designs, improved data handling, low-code integration, and stronger resilience, reshaping agile, scalable software development.

AICloud NativeDevSecOps
0 likes · 10 min read
Top Microservices Trends Shaping 2025: Edge, Serverless, AI & More
Ops Development Stories
Ops Development Stories
Nov 6, 2024 · Cloud Native

Koordinator vs Crane: Which Scheduler Optimizes Kubernetes Resource Usage?

The article examines how native Kubernetes scheduling based solely on resource requests leads to waste and imbalance, compares the open‑source crane‑scheduler and koord‑scheduler architectures, explains practical configuration of Koordinator, and provides step‑by‑step testing procedures to achieve load‑aware scheduling.

Cloud NativeKoordinatorKubernetes
0 likes · 7 min read
Koordinator vs Crane: Which Scheduler Optimizes Kubernetes Resource Usage?
MaGe Linux Operations
MaGe Linux Operations
Nov 4, 2024 · Cloud Native

Essential kubectl Commands for Viewing, Managing, and Debugging Kubernetes

This guide walks you through essential kubectl commands for checking cluster status, inspecting resources, retrieving detailed object information, monitoring logs, managing configurations, labeling, and performing create, update, and delete operations, empowering you to efficiently view, troubleshoot, and control Kubernetes workloads.

Cloud NativeDevOpsKubernetes
0 likes · 13 min read
Essential kubectl Commands for Viewing, Managing, and Debugging Kubernetes
Code Mala Tang
Code Mala Tang
Nov 4, 2024 · Cloud Native

Master Docker Images: From Basics to Building and Managing Containers

This article explains Docker images as lightweight, immutable templates, details their layered architecture, shows how to build them with Dockerfiles, demonstrates pulling and running images from Docker Hub, and covers essential commands for managing and cleaning up images.

Cloud NativeContainersDocker
0 likes · 12 min read
Master Docker Images: From Basics to Building and Managing Containers
MaGe Linux Operations
MaGe Linux Operations
Nov 3, 2024 · Cloud Native

Master Docker and Kubernetes: Core Concepts Explained for Cloud‑Native Beginners

This guide introduces Docker’s architecture, advantages over virtual machines, key components such as Daemon, Images, Containers, CLI, Dockerfile, Compose and Swarm, then explains Kubernetes fundamentals, its architecture, core objects like Pods, Volumes, Deployments, Services, Namespaces, and how they interact through the API.

Cloud NativeContainersDevOps
0 likes · 18 min read
Master Docker and Kubernetes: Core Concepts Explained for Cloud‑Native Beginners
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 3, 2024 · Cloud Native

Leveraging Alibaba Cloud ACK Backup Center for Cross‑Cloud Container Service Migration: Challenges and Solutions

This article outlines the multifaceted challenges of cross‑cloud container service migration—including data security, service interruption, compatibility, and storage complexities—and demonstrates how Alibaba Cloud ACK's Backup Center can address these issues through comprehensive backup, selective restoration, and automated resource adjustments.

ACKBackup CenterCloud Native
0 likes · 10 min read
Leveraging Alibaba Cloud ACK Backup Center for Cross‑Cloud Container Service Migration: Challenges and Solutions
Architect
Architect
Oct 31, 2024 · Cloud Native

Designing a Resilient Stateful Distributed System for Cloud‑Native Environments

This article analyzes the motivations, models, and design considerations for building stateful distributed architectures—covering microservices, service discovery, access‑layer isolation, fault tolerance, scaling, and deployment strategies—to help architects create reliable, low‑latency cloud‑native systems.

Cloud NativeDistributed SystemsMicroservices
0 likes · 33 min read
Designing a Resilient Stateful Distributed System for Cloud‑Native Environments
Kuaishou Tech
Kuaishou Tech
Oct 31, 2024 · Cloud Native

Stateful Service Cloud‑Native Practices: Kuaishou’s Redis on Kubernetes

This article examines the challenges and benefits of running stateful services such as Redis on Kubernetes, presents Kuaishou’s practical experience with cloud‑native migration, evaluates risks and performance impacts, and details the custom workloads, operators, federation and KubeBlocks solutions that enable large‑scale, reliable stateful service orchestration.

Cloud NativeFederationKubeBlocks
0 likes · 12 min read
Stateful Service Cloud‑Native Practices: Kuaishou’s Redis on Kubernetes
Alibaba Cloud Native
Alibaba Cloud Native
Oct 31, 2024 · Cloud Native

Mastering Full-Chain Gray Release with Alibaba Cloud SAE

This article explains the challenges of gray releases in complex microservice architectures and provides a step‑by‑step guide to implementing full‑chain gray deployment using Alibaba Cloud Serverless Application Engine (SAE) integrated with MSE, covering isolation methods, core concepts, features, and practical operations.

Cloud NativeContinuous DeliveryMSE
0 likes · 11 min read
Mastering Full-Chain Gray Release with Alibaba Cloud SAE
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 30, 2024 · Cloud Native

Effortlessly Distribute Apps Across Multiple ACK One Clusters

This guide explains how ACK One enables fast, Git‑free distribution of Kubernetes application resources from a single fleet cluster to multiple target clusters, covering the required CRDs, policy definitions, step‑by‑step YAML examples, and migration from single‑cluster to high‑availability multi‑cluster deployments.

ACK OneApplication DistributionCRD
0 likes · 9 min read
Effortlessly Distribute Apps Across Multiple ACK One Clusters
Volcano Engine Developer Services
Volcano Engine Developer Services
Oct 28, 2024 · Cloud Native

How ByteDance Scales Services with Multi‑Region Unitization Architecture

This article explains ByteDance’s multi‑region unitization approach, covering its core concepts, motivations, architectural challenges, traffic routing, data synchronization, cut‑over strategies, and future evolution for large‑scale, resilient services. It also discusses operational optimizations, risk controls, and the impact on cost and development efficiency.

Cloud Nativedata synchronizationmulti-region
0 likes · 20 min read
How ByteDance Scales Services with Multi‑Region Unitization Architecture
IT Services Circle
IT Services Circle
Oct 25, 2024 · Databases

Database Management Challenges in the Cloud Era and How Apache ShardingSphere Addresses Them

The article outlines the growing difficulties of managing diverse databases in cloud-native environments, introduces Apache ShardingSphere as a comprehensive open‑source solution with three core capabilities—connectivity, enhancement, and pluggability—and guides readers through a three‑step learning path from fundamentals to deployment and testing.

Cloud NativeDatabase ManagementDatabase Middleware
0 likes · 9 min read
Database Management Challenges in the Cloud Era and How Apache ShardingSphere Addresses Them
Huolala Tech
Huolala Tech
Oct 24, 2024 · Artificial Intelligence

How Huolala’s Dolphin Platform Accelerates AI Model Delivery with Cloud‑Native Automation

This article describes how Huolala built a cloud‑native AI development platform called Dolphin to overcome low model delivery efficiency and poor compute‑resource utilization, detailing its architecture, one‑stop workflow, resource‑pooling, observability, and future roadmap for scaling AI across the company.

Cloud NativeKubernetesModel Deployment
0 likes · 10 min read
How Huolala’s Dolphin Platform Accelerates AI Model Delivery with Cloud‑Native Automation
Efficient Ops
Efficient Ops
Oct 23, 2024 · Databases

How NineData Boosts R&D Collaboration 5× with Multi‑Cloud Database Management

The NineData presentation at the 2024 GOPS Global Operations Conference in Shanghai detailed multi‑cloud, multi‑source database architecture trends, showcased their intelligent data management platform, explained data replication principles, DevOps challenges and AI‑enhanced solutions, and highlighted real‑world customer success stories across industries.

AICloud NativeDevOps
0 likes · 11 min read
How NineData Boosts R&D Collaboration 5× with Multi‑Cloud Database Management
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 22, 2024 · Cloud Native

OpenYurt: Current Status, Future Roadmap, and Enterprise‑Level Integration Practices

This article introduces OpenYurt, a CNCF‑sandbox cloud‑native edge solution built on Kubernetes, explains its autonomous edge capabilities, multi‑region workload and service models, traffic‑optimisation features, and outlines the enterprise‑grade ACK Edge product with real‑world deployment scenarios and case studies.

Cloud NativeEdge ComputingKubernetes
0 likes · 10 min read
OpenYurt: Current Status, Future Roadmap, and Enterprise‑Level Integration Practices
Tencent Cloud Developer
Tencent Cloud Developer
Oct 22, 2024 · Industry Insights

Designing Stateful Distributed Systems: Core Principles and Architecture Patterns

This article analyzes the motivations, benefits, and challenges of building stateful distributed systems, compares monolithic, SOA, and microservice models, and provides detailed guidance on access layers, service discovery, fault tolerance, scaling, and data storage for cloud‑native architectures.

Cloud NativeDistributed SystemsMicroservices
0 likes · 29 min read
Designing Stateful Distributed Systems: Core Principles and Architecture Patterns
Efficient Ops
Efficient Ops
Oct 21, 2024 · Operations

Essential Prometheus Best Practices: Avoid Common Pitfalls and Boost Reliability

This article shares practical Prometheus best‑practice tips—from understanding its accuracy‑reliability trade‑offs and self‑monitoring, to avoiding NFS storage, managing high‑cardinality metrics, handling rate() and recording‑rule pitfalls, and fine‑tuning alerting—so you can run a stable, low‑cost monitoring stack.

AlertingCloud NativeOperations
0 likes · 10 min read
Essential Prometheus Best Practices: Avoid Common Pitfalls and Boost Reliability
Baidu Geek Talk
Baidu Geek Talk
Oct 21, 2024 · Databases

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Baidu MEG’s TDE‑ClickHouse optimization in the Turing 3.0 ecosystem boosts query speed up to 10×, halves latency, enables billion‑row bulk imports in under two hours, and migrates to a cloud‑native, ZooKeeper‑free architecture supporting 350 k CPU cores, 10 PB storage, and sub‑3‑second responses for 150 k daily BI queries.

Baidu MEGCloud NativeDatabase Optimization
0 likes · 19 min read
TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Oct 21, 2024 · Big Data

How Baidu’s Data Lake Acceleration 2.0 Supercharges Big Data and AI Workloads

Baidu's latest data lake acceleration 2.0 replaces HDFS with a scalable object‑storage foundation, introduces a hierarchical Namespace 2.0, a high‑throughput streaming engine, RapidFS caching, and a fully HDFS‑compatible BOS‑HDFS layer, delivering up to 70% higher throughput and dramatically lower costs for big data and AI pipelines.

AICloud Nativeobject storage
0 likes · 12 min read
How Baidu’s Data Lake Acceleration 2.0 Supercharges Big Data and AI Workloads
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 18, 2024 · Cloud Native

Comparative Study of Batch Compute and Serverless Argo Workflows for Containerized Data Processing

This article compares a cloud‑provider’s closed‑source Batch compute service with the open‑source, serverless Argo Workflows platform, demonstrating how each can orchestrate multi‑stage containerized data‑processing pipelines, detailing configuration, job definitions, dependency handling, and operational trade‑offs.

Argo WorkflowsBatch ComputeCloud Native
0 likes · 12 min read
Comparative Study of Batch Compute and Serverless Argo Workflows for Containerized Data Processing