Cloud Native 13 min read

Key Considerations for Building a Cloud‑Native Architecture

The article outlines the principles and practical considerations of cloud‑native architecture, covering platform‑agnostic design, container and Kubernetes foundations, microservice decomposition, CI/CD pipelines, monitoring, tracing, logging, and fault‑tolerant high‑availability strategies for building resilient distributed systems.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Key Considerations for Building a Cloud‑Native Architecture

Chen Langjiao, head of the Tencent Cloud Container Product Architecture team, is responsible for pre‑sale and post‑sale work of Tencent Cloud container products.

This article is compiled from his presentation at the Techo Developer Conference cloud‑native track, sharing the characteristics of cloud‑native architecture and practical considerations.

According to the CNCF definition, cloud‑native is a methodology that imposes no restrictions on programming language, framework, or middleware; it integrates advanced design concepts such as containers, microservices, loose coupling, agility, disaster recovery, frequent iteration, and automation.

Cloud computing has matured, and with the rise of open‑source communities, providers are increasingly offering platform‑agnostic services—cloud‑native services—that free users from vendor lock‑in and let them focus on business logic.

The lowest layer consists of physical machines, networks, and storage; above that is the virtualization layer providing isolated networks, compute resources, and distributed storage, which remains platform‑specific.

The crucial PaaS layer adapts compute, network, and storage resources from various vendors and offers a unified interface; in practice, the provider’s Kubernetes service integrates underlying storage, compute, and networking, as well as standard open‑source services like MySQL and Redis, enabling seamless migration across public clouds, hybrid clouds, or multi‑cloud disaster‑recovery scenarios.

In summary, a cloud‑native architecture is a platform‑agnostic, automated, disaster‑tolerant, agile distributed business system.

When building cloud‑native services, several considerations and personal reflections are presented.

The CNCF definition already covers why we should split into microservices, containerize, implement CI/CD, avoid failures, handle incidents, and validate whether a system conforms to cloud‑native principles.

Microservice discussion focuses on the completeness of service decomposition, which determines horizontal scalability and whether the system can become truly distributed—for example, in an e‑commerce platform, inventory changes should not impact product or order services, and during peak sales the order service should be able to scale independently.

Therefore, business‑logic and data coupling between services should be minimized; communication should be used for data sharing rather than shared databases, reducing the impact scope of changes.

Containers are the foundation of cloud‑native architecture; without them, CI/CD and auto‑scaling are impossible. Custom CI/CD or scaling solutions built on specific business characteristics are not portable. Containers also reduce costs: Kubernetes schedules containers onto appropriate nodes, balancing resource utilization and improving operability.

After containerization, CI/CD becomes straightforward: code is built into container images, which are then promoted through test, staging, and production environments, enabling blue‑green, canary, and other deployment strategies.

Effective operation requires mechanisms for problem discovery and定位 (diagnosis). Common tools include monitoring, tracing, and logging. Monitoring should cover both infrastructure metrics (CPU, memory, network, handles) and business metrics (response time, error rate). Tracing helps pinpoint latency bottlenecks in request call chains and can surface issues such as slow database queries, hot keys, or large keys. Logging is essential for both performance and business issue analysis. Because containers are ephemeral, logs must be centralized. Collection methods include SDK‑direct logging, agent‑based collection from files, daemonset‑based aggregation for stdout, or sidecar containers for file logs. Logs can be sent to cloud provider services or self‑hosted solutions, with separate streams for process lifecycle and business logs.

Design must assume failures are inevitable and include pre‑planned mitigation strategies. Typical failures include network, hardware, system, and business faults. Network failures require zone isolation and traffic switching; hardware failures need dispersion of VMs across zones and redundancy; system software should use provider‑optimized kernels; business releases should be incremental with rollback capability.

Combining these design principles, a robust system deploys redundant services across multiple availability zones, splits traffic at entry points for seamless failover, isolates microservices and databases for independent auto‑scaling, resulting in a reasonable distributed business system.

Focusing on business rather than infrastructure, the article shares a real case where a client needed to upgrade a Kubernetes production cluster. The provider’s TKE service performed a seamless upgrade without container restarts, whereas a self‑managed upgrade caused service disruption. The upgrade required extensive pre‑checks, version‑specific patches, and Linux‑distribution adaptations, highlighting the complexity of managing Kubernetes clusters that sit between IaaS and PaaS.

Experts also propose a migration maturity model and a checklist of questions to verify whether a system conforms to cloud‑native architecture.

The presented material mainly covers methodological aspects; implementing these goals involves substantial effort and complexity, and further sessions will detail practical experiences.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ci/cdcloud-nativeMicroservicesobservabilityhigh-availability
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.