Tagged articles
928 articles
Page 4 of 10
Baidu Geek Talk
Baidu Geek Talk
Sep 6, 2023 · Cloud Native

DeeTune: Baidu’s eBPF‑Based Cloud‑Native Network Framework for Service Topology, Traffic Recording, and Non‑Intrusive Monitoring

DeeTune is Baidu’s eBPF‑based cloud‑native network framework that automatically builds complete service topologies, records configurable inter‑service traffic, and provides non‑intrusive metric monitoring with minimal CPU and memory overhead, enabling efficient fault localization and performance analysis across heterogeneous PaaS and container environments.

BaiduNetwork FrameworkObservability
0 likes · 15 min read
DeeTune: Baidu’s eBPF‑Based Cloud‑Native Network Framework for Service Topology, Traffic Recording, and Non‑Intrusive Monitoring
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Sep 6, 2023 · Databases

REDck: A Cloud‑Native Real‑Time OLAP Data Warehouse Built on ClickHouse

REDck is a cloud‑native, real‑time OLAP data warehouse built on ClickHouse that adds elastic compute and storage scaling, object‑storage optimizations, multi‑level caching, and exactly‑once ingestion, delivering petabyte‑scale interactive analytics with ten‑fold CPU efficiency, ten‑fold cost reduction, and 99.9% availability.

Big DataClickHouseReal-time OLAP
0 likes · 21 min read
REDck: A Cloud‑Native Real‑Time OLAP Data Warehouse Built on ClickHouse
Efficient Ops
Efficient Ops
Sep 3, 2023 · Operations

Master kubectl: Essential Commands for Kubernetes Operations

This guide provides a comprehensive, step‑by‑step reference of the most frequently used kubectl commands, covering autocomplete setup, context configuration, resource creation, querying, updating, patching, scaling, deletion, pod interaction, node management, and advanced set operations for effective Kubernetes cluster administration.

CLIcloud-nativekubectl
0 likes · 17 min read
Master kubectl: Essential Commands for Kubernetes Operations
Ops Development Stories
Ops Development Stories
Sep 1, 2023 · Cloud Native

Why OpenObserve Beats Elasticsearch with 140× Lower Storage Costs

OpenObserve is a Rust‑based, cloud‑native observability platform that offers log, metric, and trace collection with up to 140‑times lower storage costs than Elasticsearch, supports both single‑node and HA deployments, provides a built‑in UI, and includes detailed installation and query examples for Kubernetes environments.

Log ManagementRustcloud-native
0 likes · 12 min read
Why OpenObserve Beats Elasticsearch with 140× Lower Storage Costs
DevOps Cloud Academy
DevOps Cloud Academy
Aug 29, 2023 · Cloud Native

Observability and Data Collection Strategies in Cloud‑Native Environments

The article explains that while observability is not new, cloud‑native systems have driven rapid development of observable platforms, detailing data collection architectures, direct push versus file‑based approaches, and various sampling techniques (head, tail, and local sampling) to balance completeness, real‑time reporting, and performance impact.

Samplingcloud-nativedata collection
0 likes · 11 min read
Observability and Data Collection Strategies in Cloud‑Native Environments
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 25, 2023 · Cloud Native

How ByteDance Scaled with Multi‑Cloud: Lessons from Their Cloud‑Native Journey

ByteDance’s multi‑cloud evolution, driven by rapid business growth, cost control, and compliance needs, showcases a distributed cloud‑native platform built on open‑source orchestration, unified resource management, and advanced data‑lake solutions, while addressing operational complexity, interoperability, and emerging AI‑driven challenges.

AIBig DataKubernetes
0 likes · 14 min read
How ByteDance Scaled with Multi‑Cloud: Lessons from Their Cloud‑Native Journey
Wukong Talks Architecture
Wukong Talks Architecture
Aug 25, 2023 · Cloud Native

Cloud-Native Architecture Evolution and Practices at ZEEK Automotive

This article details ZEEK Automotive's transition to a cloud‑native architecture, describing how the company adopted Kubernetes, containerization, micro‑service unification, and full‑link gray release to improve system stability, scalability, and development efficiency across its APP ecosystem.

DevOpsKubernetesMicroservices
0 likes · 18 min read
Cloud-Native Architecture Evolution and Practices at ZEEK Automotive
dbaplus Community
dbaplus Community
Aug 22, 2023 · Operations

Designing a Multi‑Cloud Intelligent Monitoring Platform at Huolala: Architecture, Practices, and Future Directions

This article details Huolala's one‑stop monitoring platform called Monitor, covering its multi‑cloud architecture, data collection pipelines, real‑time business monitoring, unified alarm handling, and future AI‑driven enhancements, while sharing concrete metrics, incident case studies, and practical implementation steps for large‑scale observability.

GPTObservabilityOperations
0 likes · 19 min read
Designing a Multi‑Cloud Intelligent Monitoring Platform at Huolala: Architecture, Practices, and Future Directions
Alibaba Cloud Native
Alibaba Cloud Native
Aug 22, 2023 · Cloud Native

How Alibaba Cloud Service Mesh ASM Accelerated Cloud‑Native Transformation at Lixun Logistics

This article details how Lixun Logistics migrated from an IDC‑based architecture to a fully cloud‑native environment using Alibaba Cloud Service Mesh ASM, addressing version‑upgrade challenges, heterogeneous system governance, complex operations, and achieving 40% faster operations and a 50% reduction in implementation time.

ASMAlibaba CloudMicroservices
0 likes · 13 min read
How Alibaba Cloud Service Mesh ASM Accelerated Cloud‑Native Transformation at Lixun Logistics
MaGe Linux Operations
MaGe Linux Operations
Aug 19, 2023 · Cloud Native

Master Dockerfile: Build Custom Images Like a Pro

This tutorial explains how to use Dockerfile to customize images, covering essential instructions such as FROM, RUN, COPY, ADD, CMD, ENTRYPOINT, ENV, VOLUME, EXPOSE, and WORKDIR, with practical examples, best‑practice tips, and details on build context and image layering.

ContainerDevOpsDocker
0 likes · 27 min read
Master Dockerfile: Build Custom Images Like a Pro
Cloud Native Technology Community
Cloud Native Technology Community
Aug 17, 2023 · Cloud Native

Kubernetes v1.28 (Planternetes) Release: New Features, Enhancements, and Deprecations

Kubernetes v1.28, codenamed Planternetes, introduces 45 enhancements—including expanded version skew support, non‑graceful node shutdown recovery, improved CRD validation, beta ValidatingAdmissionPolicies, hybrid version proxy, and many features promoted to stable—while also deprecating and removing several older components, and provides download and community resources for adoption.

DeprecationsKubernetesPlanternetes
0 likes · 20 min read
Kubernetes v1.28 (Planternetes) Release: New Features, Enhancements, and Deprecations
Java Architecture Diary
Java Architecture Diary
Aug 14, 2023 · Cloud Native

Explore Spring Cloud 2023.0 (Leiden) Milestone: New Features & Maven Setup

Spring Cloud 2023.0 (Leiden) milestone, built on Spring Boot 3.2, introduces updated dependencies, repository configurations, and early support for MVC Server in Spring Cloud Gateway, enhanced Kafka Streams handling in Spring Cloud Stream, Java HttpClient in OpenFeign, and upgraded Kubernetes integration, with source code and deployment links provided.

Microservicescloud-nativemaven
0 likes · 6 min read
Explore Spring Cloud 2023.0 (Leiden) Milestone: New Features & Maven Setup
Architects Research Society
Architects Research Society
Aug 10, 2023 · Cloud Native

Resilience Strategies for Cloud‑Native Distributed Systems

This article explains how cloud‑native distributed systems achieve higher availability through resilience strategies such as load balancing, timeouts with automatic retries, deadlines, and circuit breakers, describing their placement across OSI layers, implementation options via libraries or proxies, and practical algorithm choices.

Microservicescircuit breakercloud-native
0 likes · 25 min read
Resilience Strategies for Cloud‑Native Distributed Systems
Programmer DD
Programmer DD
Jul 29, 2023 · Cloud Native

What’s New in Spring Cloud 2022.0.4? Key Updates and Migration Tips

Spring Cloud 2022.0.4 has been released on Maven Central, built on Spring Boot 3.0.9, and introduces major changes such as discontinued CLI and Cloudfoundry support, migration of Sleuth to Micrometer Tracing, numerous OpenFeign and Netflix updates, plus a full list of updated modules and their versions.

cloud-nativerelease-notesspring-boot
0 likes · 6 min read
What’s New in Spring Cloud 2022.0.4? Key Updates and Migration Tips
Java Architecture Diary
Java Architecture Diary
Jul 29, 2023 · Backend Development

What’s New in Spring Cloud 2022.0.4? Key Changes and Upgrade Guide

Spring Cloud 2022.0.4, built on Spring Boot 3.0.9, introduces module version updates, deprecations, and configuration changes—including discontinued components, migration to Micrometer Tracing, OpenFeign enhancements, and a detailed dependencyManagement snippet—for developers upgrading their microservice platforms.

Microservicesbackend-developmentcloud-native
0 likes · 6 min read
What’s New in Spring Cloud 2022.0.4? Key Changes and Upgrade Guide
Zuoyebang Tech Team
Zuoyebang Tech Team
Jul 28, 2023 · Operations

How Ops Teams Can Thrive in the Cloud‑Native Era: Strategies and Lessons

This article explores how the rise of cloud‑native technologies forces traditional operations to transform into service‑oriented platforms, detailing new organizational structures, the OPaS model, onion‑layered migration, practical steps, and key lessons for successful ops modernization.

DevOpsOperationsTransformation
0 likes · 19 min read
How Ops Teams Can Thrive in the Cloud‑Native Era: Strategies and Lessons
dbaplus Community
dbaplus Community
Jul 27, 2023 · Operations

How to Build Scalable Observability for Cloud‑Native Environments: Lessons from SRE

This article summarizes a technical talk on the challenges of cloud‑native transformation, the design of an application‑centric observability platform using CMDB, Prometheus, Thanos and VictoriaMetrics, practical solutions for high‑cardinality metrics and alerting, and future directions such as eBPF and AI‑driven fault detection.

CMDBMetricsObservability
0 likes · 14 min read
How to Build Scalable Observability for Cloud‑Native Environments: Lessons from SRE
Architect's Guide
Architect's Guide
Jul 26, 2023 · Cloud Native

Amazon Prime Video Case Study: From Serverless Microservices to a Cost‑Effective Monolith

An in‑depth analysis of Amazon Prime Video’s monitoring service reveals how the team abandoned a costly serverless micro‑service architecture in favor of a streamlined monolith on EC2/ECS, cutting infrastructure expenses by over 90% while improving scalability, prompting industry leaders to reassess cloud‑native design assumptions.

AWSCost OptimizationMicroservices
0 likes · 11 min read
Amazon Prime Video Case Study: From Serverless Microservices to a Cost‑Effective Monolith
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Jul 24, 2023 · Databases

Why OpenGemini Is Emerging as the Go-To Open-Source Time-Series Database

On July 19, the openGemini community and Huawei Cloud DTT hosted a live session detailing openGemini’s open-source license, high performance, distributed architecture, and six key capabilities—including streaming aggregation, multi-level downsampling, log search, AI-driven anomaly detection, and a high-cardinality engine—showcasing its suitability for massive telemetry data across cloud-native and IoT scenarios.

Distributedcloud-nativehigh performance
0 likes · 10 min read
Why OpenGemini Is Emerging as the Go-To Open-Source Time-Series Database
Alibaba Terminal Technology
Alibaba Terminal Technology
Jul 21, 2023 · Cloud Native

How Tengine-Ingress Boosts Cloud‑Native Traffic with Zero‑Downtime Updates

Tengine-Ingress, Alibaba’s cloud‑native ingress gateway built on Tengine‑Proxy, replaces the legacy Tengine gateway by delivering dynamic, loss‑less configuration updates, high‑availability gray‑release mechanisms, global consistency checks, and significant performance gains in TLS handshake latency, CPU usage, and memory consumption across large‑scale deployments.

IngressKubernetescloud-native
0 likes · 19 min read
How Tengine-Ingress Boosts Cloud‑Native Traffic with Zero‑Downtime Updates
Architects Research Society
Architects Research Society
Jul 16, 2023 · Big Data

Four Innovation Phases of Netflix’s Trillion‑Scale Real‑Time Data Infrastructure

The article chronicles Netflix’s evolution from a failing batch pipeline to a cloud‑native, self‑service streaming platform, detailing four development phases, the technical challenges faced, the stream‑processing patterns introduced, key learnings, and future opportunities for real‑time data and machine‑learning workloads.

Data PlatformFlinkKafka
0 likes · 30 min read
Four Innovation Phases of Netflix’s Trillion‑Scale Real‑Time Data Infrastructure
AntTech
AntTech
Jul 14, 2023 · Cloud Native

KapacityStack: Open‑Source Cloud‑Native Intelligent Capacity Management and IHPA

KapacityStack is an open‑source, cloud‑native capacity platform from Ant Group that introduces the Intelligent Horizontal Pod Autoscaler (IHPA) to provide predictive, multi‑level, and stable autoscaling, reducing resource waste, carbon emissions, and operational costs while supporting extensible, modular integration with Kubernetes workloads.

autoscalingcapacity managementcloud-native
0 likes · 11 min read
KapacityStack: Open‑Source Cloud‑Native Intelligent Capacity Management and IHPA
Efficient Ops
Efficient Ops
Jul 5, 2023 · Big Data

How ByteDance Built a Cloud‑Native Big Data Ops Platform for Unified Logging & Alerts

ByteDance’s cloud‑native big data operations platform consolidates logging, monitoring, and alerting across heterogeneous environments, using unified log collection (intrusive and Filebeat), dynamic alert rules, customizable notification plugins, and scalable monitoring pipelines, thereby reducing operational complexity, shielding users from infrastructure differences, and enhancing multi‑tenant efficiency.

cloud-native
0 likes · 10 min read
How ByteDance Built a Cloud‑Native Big Data Ops Platform for Unified Logging & Alerts
DaTaobao Tech
DaTaobao Tech
Jul 5, 2023 · Cloud Native

Cloud‑Native Multi‑Tenant Architecture and Network Isolation in Taobao Open Platform

The Taobao Open Platform adopts a cloud‑native, multi‑tenant architecture that abstracts infrastructure, isolates tenants via independent or shared switch‑plus‑security‑group schemes with dual ENI pod networking, and leverages Kubernetes auto‑scaling to simplify onboarding, cut operational costs, and enable future low‑code and FaaS extensions.

Auto ScalingKubernetescloud-native
0 likes · 14 min read
Cloud‑Native Multi‑Tenant Architecture and Network Isolation in Taobao Open Platform
Efficient Ops
Efficient Ops
Jul 4, 2023 · Big Data

How Cloud‑Native Architecture Transforms Big Data Operations at ByteDance

This article explains how ByteDance migrated its complex, component‑heavy big‑data platform to a cloud‑native architecture, detailing the challenges of traditional deployments, the benefits of micro‑service, container, immutable‑infrastructure and declarative‑API approaches, and the resulting low‑resource, highly‑scalable, portable operations framework.

big-datacloud-nativedisk-management
0 likes · 16 min read
How Cloud‑Native Architecture Transforms Big Data Operations at ByteDance
Didi Tech
Didi Tech
Jul 4, 2023 · Cloud Native

eBPF Technology and Its Application in Didi's Cloud-Native Observability: HuaTuo Platform Practice

eBPF, a safe, high‑performance Linux kernel extension evolving from the 1993 Berkeley Packet Filter to modern dynamic tracing, underpins Didi’s HuaTuo platform, which consolidates bytecode management, fast data processing, stability self‑healing, and container insight to solve traffic replay, topology, security, and root‑cause analysis challenges across cloud‑native services, with plans to broaden business use and community collaboration.

Container SecurityHuatuoObservability
0 likes · 12 min read
eBPF Technology and Its Application in Didi's Cloud-Native Observability: HuaTuo Platform Practice
AI Cyberspace
AI Cyberspace
Jul 4, 2023 · Databases

Benchmarking Cloud‑Native Data Warehouses: Cloudwave vs StarRocks Performance Test

This article compares traditional databases with modern cloud‑native data warehouses, outlines a detailed performance testing methodology using the SSB1000 benchmark, presents test scripts and environment setup for Cloudwave and StarRocks, and analyzes the results to highlight strengths and optimization opportunities.

Data WarehousePerformance TestingSQL
0 likes · 21 min read
Benchmarking Cloud‑Native Data Warehouses: Cloudwave vs StarRocks Performance Test
Architect
Architect
Jul 3, 2023 · Cloud Native

The 12‑Factor Methodology for Building Cloud‑Native SaaS Applications

This article presents the 12‑Factor methodology—a language‑agnostic set of best‑practice principles for designing, deploying, and operating cloud‑native SaaS applications—covering codebase management, explicit dependencies, environment‑based configuration, backing services, build‑release‑run separation, process model, port binding, concurrency, disposability, parity between development and production, logging, and one‑off admin tasks.

12-factorSaaScloud-native
0 likes · 23 min read
The 12‑Factor Methodology for Building Cloud‑Native SaaS Applications
DevOps Cloud Academy
DevOps Cloud Academy
Jul 3, 2023 · Operations

Top 10 Kubernetes Monitoring Tools and Their Features

This article reviews ten popular Kubernetes monitoring tools—including Helios, Prometheus, New Relic, Grafana, DataDog, Sysdig, Zabbix, AppDynamics, Dynatrace, and Sensu—detailing their key features and how they help improve performance, reliability, and observability of containerized applications.

DevOpsKubernetescloud-native
0 likes · 11 min read
Top 10 Kubernetes Monitoring Tools and Their Features
Open Source Linux
Open Source Linux
Jun 30, 2023 · Cloud Native

Essential Kubernetes Tools to Boost Your DevOps Workflow

This article reviews a curated set of open‑source Kubernetes tools—including Helm, Flagger, Kubewatch, Gitkube, kube‑state‑metrics, Kamus, Untrak, Scope, Dashboard, Kops, cAdvisor, Kubespray, K9s, Kubetail, PowerfulSeal, and Popeye—that enhance management, security, monitoring, and deployment within DevOps pipelines.

Securitycloud-nativemonitoring
0 likes · 11 min read
Essential Kubernetes Tools to Boost Your DevOps Workflow
Architects Research Society
Architects Research Society
Jun 20, 2023 · Backend Development

Designing Microservice Architecture: Patterns, Principles, and Best Practices

This article guides readers through the evolution from monolithic to event‑driven microservice architectures, covering design patterns, scalability, reliability, communication strategies such as API gateways, BFF, service aggregation, asynchronous messaging, CQRS, event sourcing, and technology choices like Kafka and RabbitMQ to build highly available, scalable, and maintainable systems.

CQRSDesign PatternsEvent-driven
0 likes · 24 min read
Designing Microservice Architecture: Patterns, Principles, and Best Practices
Volcano Engine Developer Services
Volcano Engine Developer Services
Jun 20, 2023 · Artificial Intelligence

Boosting Large-Model Offline Inference with Ray and Cloud-Native Architecture

Large-model offline (batch) inference, which processes massive data on billion-parameter models, faces GPU memory and distributed scheduling challenges; this article explains how Ray's cloud-native framework, model parallelism, and Ray Datasets pipelines address these issues, improve throughput, and enable elastic, efficient GPU utilization.

GPU utilizationRaycloud-native
0 likes · 16 min read
Boosting Large-Model Offline Inference with Ray and Cloud-Native Architecture
Architects' Tech Alliance
Architects' Tech Alliance
Jun 19, 2023 · Fundamentals

Understanding Complex Systems and Software Architecture: Definitions, Types, Principles, and Design Considerations

This article explains what complex systems and software architecture are, outlines various architectural categories, discusses essential functional and non‑functional requirements, and presents design principles and typical solutions such as domain‑driven design, microservices, cloud‑native, DevOps, and big‑data architectures for building stable, scalable, and maintainable systems.

Big DataComplex SystemsDomain-Driven Design
0 likes · 13 min read
Understanding Complex Systems and Software Architecture: Definitions, Types, Principles, and Design Considerations
ITPUB
ITPUB
Jun 15, 2023 · Databases

How Domestic Databases Are Shaping China’s Financial Digital Transformation

Amid China’s push for digital and domestic technology, the article examines the evolution of native database products, the opportunities and challenges they face—especially in the financial sector—and how policy, cloud‑native architectures, distributed systems, and multi‑cloud demands are driving the next wave of innovation.

ChinaDigital TransformationDistributed Systems
0 likes · 10 min read
How Domestic Databases Are Shaping China’s Financial Digital Transformation
Alibaba Cloud Native
Alibaba Cloud Native
Jun 15, 2023 · Cloud Native

Why Event‑Driven Architecture Is the Future and How RocketMQ EventBridge Makes It Work

This article explains the fundamentals of event‑driven architecture, contrasts events with commands, outlines its four key characteristics, compares integration patterns, and details the capabilities and technical design of RocketMQ EventBridge, including standards, event hubs, schemas, and serverless integration.

Event-drivenEventBridgearchitecture
0 likes · 18 min read
Why Event‑Driven Architecture Is the Future and How RocketMQ EventBridge Makes It Work
Ant R&D Efficiency
Ant R&D Efficiency
Jun 12, 2023 · Cloud Native

Platform Engineering: Challenges and Best Practices in Large-Scale Implementation

Platform engineering at scale requires unified self‑service abstractions, domain‑specific languages like KCL, divide‑and‑conquer monorepo structures, robust modeling and automation, and a collaborative culture, as demonstrated by Ant Group’s KusionStack implementation that supports thousands of projects with a sub‑one‑to‑nine platform‑to‑developer ratio.

DevOpsGitOpsKCL
0 likes · 20 min read
Platform Engineering: Challenges and Best Practices in Large-Scale Implementation
Programmer DD
Programmer DD
May 29, 2023 · Cloud Native

What Is Azure Linux and How Does It Power Azure Kubernetes Service?

Azure Linux is Microsoft’s custom open‑source Linux distribution designed as a lightweight, Azure‑focused container host for AKS, offering optimized performance across cloud and edge, with minimal dependencies and support tailored for Kubernetes workloads.

AzureKubernetesLinux
0 likes · 6 min read
What Is Azure Linux and How Does It Power Azure Kubernetes Service?
Efficient Ops
Efficient Ops
May 24, 2023 · Cloud Native

Cloud‑Native Tech Transforming Financial Operations

This report examines how the widespread adoption of cloud‑native technologies is reshaping traditional IT operations, outlines the challenges faced by legacy ops, proposes a six‑pillar sensitive operations framework, showcases financial‑industry case studies, and highlights emerging trends such as advanced observability, GitOps, distributed cloud and FinOps.

cloud-native
0 likes · 42 min read
Cloud‑Native Tech Transforming Financial Operations
ITPUB
ITPUB
May 10, 2023 · Cloud Native

How Meituan’s MStore Achieves Scalable Storage‑Compute Separation in Cloud‑Native Environments

This article explains how Meituan’s storage team designed the MStore distributed storage platform to separate storage and compute, addressing scaling, cost, and reliability challenges of monolithic architectures, and details its cloud‑native components, data model, performance optimizations, observability, and the derived EBS block‑storage service.

Distributed SystemsMStorecloud-native
0 likes · 16 min read
How Meituan’s MStore Achieves Scalable Storage‑Compute Separation in Cloud‑Native Environments
Tencent Cloud Developer
Tencent Cloud Developer
May 10, 2023 · Cloud Native

Tencent's Large‑Scale Cloud‑Native Migration: Challenges and Solutions

In October 2022 Tencent finished migrating its flagship services—including QQ, WeChat, and Honor of Kings—to a cloud‑native architecture spanning over 50 million CPU cores, overcoming millisecond‑level upgrade, stateful in‑place refresh, massive cross‑region scaling, and heterogeneous hardware by deploying the TKEx platform’s sidecar upgrades, three‑container patterns, Global Scaler Operator, machine‑type abstraction, and Clusternet‑based application‑centric orchestration, boosting CPU utilization to 65 % and establishing China’s largest cloud‑native practice.

Container UpgradeTencentcloud-native
0 likes · 19 min read
Tencent's Large‑Scale Cloud‑Native Migration: Challenges and Solutions
Tencent Cloud Developer
Tencent Cloud Developer
May 8, 2023 · Cloud Native

Modernizing Tencent Cloud Log Service (CLS): Cloud‑Native Architecture, Challenges, and Benefits

Tencent Cloud Log Service was modernized by migrating over 95 % of its components to a cloud‑native stack of containers, Kubernetes, and declarative APIs, addressing chaotic infrastructure, stateful‑to‑stateless conversion, configuration drift, upgrade risk, elastic scaling, traffic protection and observability, which cut costs by more than 20 million CNY, reduced scaling latency by 90 %, and achieved over 99.99 % availability with petabyte‑scale burst handling.

Configuration ManagementLog ServiceObservability
0 likes · 15 min read
Modernizing Tencent Cloud Log Service (CLS): Cloud‑Native Architecture, Challenges, and Benefits
MaGe Linux Operations
MaGe Linux Operations
May 1, 2023 · Cloud Native

Unlock Hidden kubectl Tricks: Boost Your Kubernetes Workflow

This article shares a collection of practical kubectl commands and tips—including API debugging, pod filtering and deletion, node‑wise pod statistics, and proxy usage—to help Kubernetes users work more efficiently and avoid writing custom client code.

KubernetesOperationsTips
0 likes · 8 min read
Unlock Hidden kubectl Tricks: Boost Your Kubernetes Workflow
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Apr 25, 2023 · Databases

REDtao: A High-Performance Graph Storage System for Social Graph Data

REDtao is a high‑performance graph storage system built for Xiaohongshu that extends Facebook’s Tao architecture with a three‑layer hash structure, decoupled caching, leader‑follower distribution and cross‑cloud availability, delivering over 90% cache hits, 70% MySQL CPU reduction, 150 M QPS on a 16‑core server and seamless migration despite a 250% surge in daily‑active‑user traffic.

cachingcloud-nativedistributed system
0 likes · 17 min read
REDtao: A High-Performance Graph Storage System for Social Graph Data
Qunar Tech Salon
Qunar Tech Salon
Apr 24, 2023 · Operations

Design and Evolution of Qunar's Watcher Enterprise Monitoring Platform

The article details the background, architecture, core features, alert governance, trace integration, and cloud‑native evolution of Watcher, Qunar's internally built, highly scalable monitoring platform that unifies application‑level metrics, alerting, and observability across thousands of services and containers.

AlertingDevOpsObservability
0 likes · 19 min read
Design and Evolution of Qunar's Watcher Enterprise Monitoring Platform
ITPUB
ITPUB
Apr 23, 2023 · Cloud Native

How Kindling Leverages eBPF to Reach 1‑5‑10 Observability Targets

This article examines the difficulty of achieving the 1‑5‑10 observability goal, reviews current tracing, logging, and metrics tools, introduces the open‑source Kindling project’s eBPF‑based trace‑profiling approach, and walks through several real‑world use cases that demonstrate faster root‑cause analysis in cloud‑native environments.

KindlingObservabilityRoot Cause Analysis
0 likes · 16 min read
How Kindling Leverages eBPF to Reach 1‑5‑10 Observability Targets
Alibaba Cloud Native
Alibaba Cloud Native
Apr 22, 2023 · Cloud Native

Unlock Higress 1.0 RC: A One‑Click Cloud‑Native Gateway with Plugins, Hot Updates, and Full Observability

Higress 1.0.0‑rc offers a ready‑to‑use cloud‑native gateway with one‑click installation via Helm, a console that simplifies Ingress/Gateway APIs, extensive plugin and service‑discovery support, hot‑update capabilities, built‑in observability, and detailed documentation for advanced traffic management and community contribution.

cloud-nativegatewayhelm
0 likes · 9 min read
Unlock Higress 1.0 RC: A One‑Click Cloud‑Native Gateway with Plugins, Hot Updates, and Full Observability
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 17, 2023 · Operations

How to Break Through Scale‑Out Ops Bottlenecks in the Cloud‑Native Era

This article analyzes the three main bottlenecks—stability, cost, and efficiency—encountered in large‑scale operations, presents a six‑stage pipeline and open‑source toolchain, and explains how cloud‑native technologies such as Kubernetes and AIOps can transform and automate massive infrastructure management.

KubernetesScalabilityaiops
0 likes · 18 min read
How to Break Through Scale‑Out Ops Bottlenecks in the Cloud‑Native Era
Architects Research Society
Architects Research Society
Apr 15, 2023 · Operations

Curated List of Awesome Open‑Source Workflow Engines

This article presents a comprehensive curated list of open‑source workflow engines and BPM suites, including Airflow, Argo, Cadence, Zeebe, Oozie, Camunda, and many others, with brief descriptions of their primary features, typical use cases, and suitability for tasks such as job orchestration, micro‑service coordination, and data pipeline automation.

BPMcloud-nativeopen-source
0 likes · 7 min read
Curated List of Awesome Open‑Source Workflow Engines
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 10, 2023 · Cloud Native

How OCP’s Cloud‑Native Migration Boosts Development Efficiency on Huawei Cloud

The article explains how the open‑source Open‑Capacity‑Platform (OCP) is transformed into a cloud‑native solution on Huawei Cloud, detailing the architectural changes, benefits, step‑by‑step migration process, and the supporting cloud services that together improve development speed, operational safety, and cost efficiency.

DevOpsHuawei CloudMicroservices
0 likes · 11 min read
How OCP’s Cloud‑Native Migration Boosts Development Efficiency on Huawei Cloud
dbaplus Community
dbaplus Community
Apr 5, 2023 · Cloud Native

How Baidu’s Search Platform Achieves Billion‑Scale Observability in a Cloud‑Native Era

This article explains why observability is critical in cloud‑native architectures and describes how Baidu’s search middle‑platform handles hundred‑billion‑level traffic by implementing low‑cost real‑time metrics, distributed tracing, log querying and topology analysis, while tackling challenges of massive microservice scale, scenario‑level monitoring, and efficient resource usage.

MetricsObservabilitycloud-native
0 likes · 12 min read
How Baidu’s Search Platform Achieves Billion‑Scale Observability in a Cloud‑Native Era
Baidu Tech Salon
Baidu Tech Salon
Mar 29, 2023 · Artificial Intelligence

Punica System: Enhancing AI Inference Service Efficiency Through FaaS Architecture

The Punica system unifies AI inference development, testing, deployment, and maintenance on a FaaS‑based one‑stop platform that automates resource scheduling, self‑healing, and monitoring, supporting multiple frameworks and GPUs, thereby doubling onboarding speed, quintuple scaling efficiency, and reclaiming hundreds of GPU cards.

AI inferenceFaaS architectureGPU scheduling
0 likes · 13 min read
Punica System: Enhancing AI Inference Service Efficiency Through FaaS Architecture
Baidu Geek Talk
Baidu Geek Talk
Mar 29, 2023 · Cloud Native

Punica: A Cloud‑Native Platform for Content Understanding Inference Services

Punica provides a cloud‑native, one‑stop platform that unifies Baidu’s content‑understanding inference services, automates testing, resource provisioning, and monitoring, and enables unattended, self‑healing operations with dynamic scaling and GPU scheduling, cutting onboarding time by half and reclaiming hundreds of GPUs.

AI inferenceInference PlatformService Orchestration
0 likes · 14 min read
Punica: A Cloud‑Native Platform for Content Understanding Inference Services
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 29, 2023 · Backend Development

How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka

ByteHouse evolved its real‑time import pipeline from a community ClickHouse architecture to a custom HaKafka engine and a cloud‑native design, addressing node failures, read‑write conflicts, scaling costs, and latency by introducing two‑level concurrency, memory tables, exactly‑once semantics, and robust fault‑tolerance.

Distributed SystemsKafkaReal-time Ingestion
0 likes · 15 min read
How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka
Bilibili Tech
Bilibili Tech
Mar 28, 2023 · Cloud Computing

Multi‑Cloud Management Platform ARES: Architecture, Features and Practices

ARES, Bilibili’s multi‑cloud management platform, unifies resource provisioning, asset inventory, user access, and cost optimization across public clouds through a layered architecture, project‑centric tagging, Terraform‑based orchestration, and centralized security, while addressing manual provisioning, fragmented permissions, and visibility challenges, and plans to extend into hybrid‑cloud automation.

Resource OrchestrationTerraformcloud-native
0 likes · 26 min read
Multi‑Cloud Management Platform ARES: Architecture, Features and Practices
ITPUB
ITPUB
Mar 24, 2023 · Cloud Native

Why Open‑Falcon Stalled and How Cloud‑Native Monitoring Is Evolving

This article reviews the evolution of monitoring in the cloud‑native era, analyzes Open‑Falcon’s architecture, strengths, and shortcomings, explains why its development hit a bottleneck, and outlines the design principles and features of the Nightingale monitoring system as a modern, open‑source alternative.

MicroservicesObservabilityOpen-Falcon
0 likes · 15 min read
Why Open‑Falcon Stalled and How Cloud‑Native Monitoring Is Evolving
System Architect Go
System Architect Go
Mar 23, 2023 · Cloud Native

Directly Accessing the Kubernetes API with curl and Custom Code

This article explains how to bypass kubectl and interact directly with the Kubernetes API using curl or any programming language, covering API discovery, request construction, resource listing, watching, and modifying objects, while illustrating concepts with JavaScript examples and shared informers.

APIKubernetescURL
0 likes · 4 min read
Directly Accessing the Kubernetes API with curl and Custom Code
Tencent Cloud Middleware
Tencent Cloud Middleware
Mar 14, 2023 · Cloud Native

How a Logistics SaaS Company Scaled to Millions Using Cloud‑Native Microservices

This article examines how the Chinese logistics SaaS firm HaiGuanJia leveraged cloud‑native technologies—Kubernetes, service mesh, and microservice frameworks—to overcome rapid user growth, improve development efficiency, enable gray releases, and smoothly migrate legacy systems while maintaining stability and agility.

KubernetesLogisticsSaaS
0 likes · 16 min read
How a Logistics SaaS Company Scaled to Millions Using Cloud‑Native Microservices
Efficient Ops
Efficient Ops
Mar 9, 2023 · Cloud Native

Master kubectl: Essential Commands and Tips for Kubernetes Management

This guide introduces the fundamental syntax of kubectl, explains its command structure, resource types, flags, output formats, and provides a comprehensive collection of practical examples—from creating resources with YAML files to querying pods, managing services, and executing commands inside containers—helping users efficiently operate Kubernetes clusters.

cloud-nativekubectl
0 likes · 7 min read
Master kubectl: Essential Commands and Tips for Kubernetes Management
DataFunTalk
DataFunTalk
Mar 9, 2023 · Big Data

Real‑Time Data Platform Architecture and Cloud‑Native Flink Migration at Manbang

This article presents a comprehensive case study of Manbang's real‑time data platform, detailing its business background, cloud‑native Flink + Hologres architecture, migration from self‑built clusters, real‑time product features, decision‑making workflows, and future roadmap, highlighting performance and cost benefits.

FlinkLogisticsStreaming
0 likes · 16 min read
Real‑Time Data Platform Architecture and Cloud‑Native Flink Migration at Manbang
Open Source Linux
Open Source Linux
Mar 9, 2023 · Operations

Prometheus vs Zabbix: Which Monitoring Tool Wins for Modern Ops?

An in‑depth comparison of Prometheus and Zabbix examines their histories, architectures, data storage, scalability, and container support, highlighting Prometheus’s cloud‑native pull model and Go‑based performance versus Zabbix’s mature, relational‑database approach, to help teams choose the right monitoring solution.

PrometheusTime Series DatabaseZabbix
0 likes · 8 min read
Prometheus vs Zabbix: Which Monitoring Tool Wins for Modern Ops?
AntTech
AntTech
Mar 7, 2023 · Cloud Native

Introduction to HoloInsight: A Cloud‑Native Lightweight Observability Platform

HoloInsight is an open‑source, cloud‑native observability platform derived from Ant Group's AntMonitor, offering integrated log‑based monitoring, business metric analysis, and AI‑driven AIOps capabilities while providing a lightweight, modular architecture and extensive extensibility for modern software stacks.

Observabilityaiopscloud-native
0 likes · 13 min read
Introduction to HoloInsight: A Cloud‑Native Lightweight Observability Platform
AntTech
AntTech
Mar 7, 2023 · Databases

CeresDB 1.0 Release: Cloud‑Native Time‑Series Database Design, Features, and Performance Evaluation

CeresDB 1.0, the open‑source cloud‑native time‑series database from Ant Group, introduces a next‑generation architecture that supports both traditional and analytical workloads, offers column‑mixed storage, distributed compute‑storage separation, multi‑language SDKs, and demonstrates significant write and query performance gains over InfluxDB in benchmark tests.

CeresDBcloud-nativedistributed storage
0 likes · 9 min read
CeresDB 1.0 Release: Cloud‑Native Time‑Series Database Design, Features, and Performance Evaluation
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 1, 2023 · Cloud Native

How KubeVela is Shaping the Future of Cloud‑Native Application Delivery

KubeVela, an award‑winning open‑source cloud‑native platform, emerged from the OAM model to simplify and standardize application delivery across multi‑cloud environments, offering user‑friendly abstractions, programmable extensibility, workflow‑centric deployment, and a community‑driven evolution that positions it as a leading solution in platform engineering.

Application DeliveryKubeVelaOAM
0 likes · 13 min read
How KubeVela is Shaping the Future of Cloud‑Native Application Delivery
Baidu Geek Talk
Baidu Geek Talk
Feb 24, 2023 · Cloud Native

Design and Resource Scheduling of Cloud‑Native AI and the PaddleFlow Workflow Engine

The article explains Baidu’s cloud‑native AI resource scheduling across single‑ and multi‑GPU nodes, describes the PaddleFlow Kubernetes‑based workflow engine with its hierarchical queues, advanced scheduling algorithms, unified storage, and how these technologies improve GPU utilization, reduce fragmentation, and simplify AI task orchestration.

AIKubernetesPaddleFlow
0 likes · 23 min read
Design and Resource Scheduling of Cloud‑Native AI and the PaddleFlow Workflow Engine
Efficient Ops
Efficient Ops
Feb 22, 2023 · Operations

Zero‑Downtime Secrets: TT Voice’s Multi‑Cloud, AIOps & Resource Optimization

During the 2022 TT Voice Annual Summit, the technical team tackled stability, real‑time risk control, and resource utilization challenges by implementing strict change management, multi‑cloud high‑availability networking, AIOps‑driven monitoring, big‑data processing, and cloud‑native scaling strategies, ultimately delivering zero‑fault operation.

Resource Optimizationaiopscloud-native
0 likes · 15 min read
Zero‑Downtime Secrets: TT Voice’s Multi‑Cloud, AIOps & Resource Optimization
ByteDance Data Platform
ByteDance Data Platform
Feb 15, 2023 · Databases

How ByteHouse Powers Real‑Time Data Warehousing at Scale

ByteHouse, a cloud‑native data warehouse built on ClickHouse, delivers ultra‑fast real‑time and massive offline analytics with elastic scaling, addressing business needs in ByteDance and the financial sector through optimized architecture, ROI‑driven monitoring, and comprehensive operational tools.

Big DataByteHouseClickHouse
0 likes · 16 min read
How ByteHouse Powers Real‑Time Data Warehousing at Scale
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Feb 15, 2023 · Cloud Computing

Baidu’s Distributed Cloud: Connecting Edge to Core – Architecture & Key Challenges

At the 2022 Zhishun Summit, Baidu Intelligent Cloud unveiled its distributed cloud architecture that unifies edge and core resources through a unified stack, detailing four evolving trends, four key pathways—including multi‑chip clouds, homogeneous stacks, hyper‑convergence, and cloud‑native design—and real‑world implementations such as low‑latency cloud gaming and vehicular data pipelines.

Baidu CloudEdge Computingcloud architecture
0 likes · 11 min read
Baidu’s Distributed Cloud: Connecting Edge to Core – Architecture & Key Challenges
ITPUB
ITPUB
Feb 13, 2023 · Databases

How Apache Doris Enables Cloud‑Native Real‑Time Data Warehousing for Log Analytics

Based on a DTCC2022 presentation, this article explains Apache Doris's high‑performance MPP architecture, its cloud‑native extensions in SelectDB, and how they solve large‑scale log storage and analysis with superior write throughput, storage efficiency, and interactive query speed.

Apache DorisMPPReal-time analytics
0 likes · 11 min read
How Apache Doris Enables Cloud‑Native Real‑Time Data Warehousing for Log Analytics
Programmer DD
Programmer DD
Feb 8, 2023 · Cloud Native

How Cloud‑Native Pipelines Cut Build Time 3‑5× with Remote Cache

This article explains how introducing a remote cache backed by CFS and Zstandard compression into cloud‑native CI/CD pipelines dramatically reduces build times by 3‑5 times, outlines the implementation steps, tool choices, cache key strategy, eviction policy, and showcases performance gains across Java, Node.js, Go, and GCC builds.

CFSPipelinecaching
0 likes · 10 min read
How Cloud‑Native Pipelines Cut Build Time 3‑5× with Remote Cache
dbaplus Community
dbaplus Community
Feb 6, 2023 · Operations

How Vivo Built a Scalable, Cloud‑Native Monitoring Platform for Millions of Services

This article outlines Vivo's multi‑year journey of designing, evolving, and operating a cloud‑native, AIOps‑enabled monitoring platform that supports tens of thousands of hosts, databases, containers, and services, detailing its architecture, challenges, and future directions for observability and reliability.

ObservabilityOperationsSystem Architecture
0 likes · 18 min read
How Vivo Built a Scalable, Cloud‑Native Monitoring Platform for Millions of Services
Ops Development Stories
Ops Development Stories
Feb 6, 2023 · Cloud Native

How to Deploy Odigos for Zero‑Code Observability on Kubernetes

This guide walks you through installing and configuring the open‑source Odigos observability control plane on a Kubernetes cluster, showing how to automatically collect traces, metrics, and logs from applications without modifying code and how to visualize the data with Grafana.

KubernetesOdigosOpenTelemetry
0 likes · 11 min read
How to Deploy Odigos for Zero‑Code Observability on Kubernetes
MaGe Linux Operations
MaGe Linux Operations
Feb 1, 2023 · Cloud Native

5 Must‑Watch CNCF Projects to Follow in 2023

Discover the five emerging CNCF projects—Teller, OpenCost, OpenFunction, External‑Secrets, and Clusterpedia—highlighting their recent addition, key features, community adoption, and how they empower DevOps engineers to manage secrets, cost allocation, serverless functions, and multi‑cluster visibility in cloud‑native environments.

CNCFDevOpscloud-native
0 likes · 7 min read
5 Must‑Watch CNCF Projects to Follow in 2023
Alibaba Cloud Native
Alibaba Cloud Native
Jan 30, 2023 · Cloud Native

How Pool-Coordinator Optimizes Cloud‑Edge Networks in OpenYurt v1.2

OpenYurt’s new v1.2 introduces the Pool‑Coordinator component, implementing node‑pool governance to cache resources, streamline YurtHub leader election, and secure communications, thereby reducing cloud‑edge bandwidth consumption and enhancing reliability for edge workloads, with detailed architecture, deployment, and future outlook.

OpenYurtPool-Coordinatorcloud-native
0 likes · 9 min read
How Pool-Coordinator Optimizes Cloud‑Edge Networks in OpenYurt v1.2
Tencent Tech
Tencent Tech
Jan 16, 2023 · Operations

How a Mini-Game Scaled to 100M DAU: Architecture, Ops, and Security Lessons

This article examines how the viral mini‑game "Sheep..." overcame its initial 5,000‑QPS bottleneck and scaled to over 100 million daily active users by redesigning its architecture, implementing cloud‑native auto‑scaling, enhancing operational monitoring with CLS, and fortifying security with WAF.

Securitycloud-nativegame-development
0 likes · 11 min read
How a Mini-Game Scaled to 100M DAU: Architecture, Ops, and Security Lessons
Tencent Cloud Developer
Tencent Cloud Developer
Jan 10, 2023 · Cloud Native

nettrace: An eBPF‑Based Tool for Network Packet Tracing, Diagnosis and Drop Monitoring in Cloud‑Native Environments

nettrace is an eBPF‑powered command‑line utility that traces a packet’s full kernel lifecycle, diagnoses network faults with a built‑in knowledge base, monitors anomalies and skb drops, supports NAT, GRE, IPVS and netfilter hooks, and replaces legacy tools like tcpdump and droptrace in cloud‑native Linux environments.

Linuxcloud-nativediagnosis
0 likes · 33 min read
nettrace: An eBPF‑Based Tool for Network Packet Tracing, Diagnosis and Drop Monitoring in Cloud‑Native Environments
Taobao Frontend Technology
Taobao Frontend Technology
Jan 10, 2023 · Frontend Development

Taobao’s 2023 Web Tech Map: Front‑End Engineering & Cloud‑Native Containers

In 2023, Taobao’s front‑end team reorganized its over‑300‑person Web division, unveiling a layered technology map that spans engineering platforms, cloud‑native JavaScript containers, development frameworks, low‑code building systems, and a suite of front‑end products such as O2, ICE, Midway, VideoX, and EVA, illustrating their end‑to‑end solutions.

Engineeringcloud-nativelow-code
0 likes · 25 min read
Taobao’s 2023 Web Tech Map: Front‑End Engineering & Cloud‑Native Containers
ITPUB
ITPUB
Jan 9, 2023 · Databases

How MatrixOne’s Hyper‑Converged Architecture Redefines Cloud‑Native Databases

The article examines MatrixOne, a cloud‑native hyper‑converged database, detailing its storage‑compute separation, unified file service, resource isolation, HTAP streaming capabilities, and emerging serverless features, while outlining future directions such as CXL memory integration and broader cloud storage support.

DistributedHTAPHyper-Converged
0 likes · 9 min read
How MatrixOne’s Hyper‑Converged Architecture Redefines Cloud‑Native Databases
Alibaba Cloud Native
Alibaba Cloud Native
Jan 9, 2023 · Cloud Native

CNStack 2.0: Cloud‑Native Design for Agile, Secure Multi‑Cluster Ops

CNStack 2.0 is a cloud‑native PaaS platform built on Kubernetes that unifies resource and workload management, offering agile, open, and secure multi‑cluster capabilities through modular cloud services, a unified API gateway, and integration with open‑source projects such as Sealer, Emissary‑Ingress, cert‑manager, Velero, and OCM.

KubernetesMulti-ClusterResource Management
0 likes · 24 min read
CNStack 2.0: Cloud‑Native Design for Agile, Secure Multi‑Cluster Ops
Tencent Cloud Developer
Tencent Cloud Developer
Jan 3, 2023 · Big Data

How Tencent’s Cloud‑Native Lakehouse Tackles PB‑Scale Performance Challenges

This article analyzes Tencent Cloud’s DLC lakehouse solution, explaining the unified data lake‑warehouse architecture, the performance hurdles of object‑storage‑based analytics, and the multi‑dimensional caching, virtual‑cluster elasticity, and advanced filter techniques that enable second‑level analysis on petabyte‑scale data while reducing costs.

Big DataDLCLakehouse
0 likes · 13 min read
How Tencent’s Cloud‑Native Lakehouse Tackles PB‑Scale Performance Challenges
Architecture Digest
Architecture Digest
Dec 30, 2022 · Operations

Vivo Monitoring Platform: Architecture, Evolution, and Future Directions

The article details the evolution, architecture, capabilities, challenges, and future plans of Vivo's comprehensive monitoring platform, covering its transition from simple Zabbix setups to a cloud‑native, AI‑ops enabled system that ensures service availability across massive infrastructure.

ObservabilityReliabilityaiops
0 likes · 16 min read
Vivo Monitoring Platform: Architecture, Evolution, and Future Directions