Tagged articles
360 articles
Page 3 of 4
Tencent Cloud Developer
Tencent Cloud Developer
Sep 21, 2020 · Industry Insights

How Beike Guarantees High Availability in Complex Real‑Estate Transactions

This article analyzes Beike's massive real‑estate ecosystem, detailing the intricate business flows, technical architecture, and quality‑assurance challenges, and explains how a suite of internal platforms—KeTest, KeOnes, sosotest, KeDiff, KePTS, and KeMTC—are engineered to deliver high‑performance, highly available services at scale.

DevOpsInfrastructureMicroservices
0 likes · 26 min read
How Beike Guarantees High Availability in Complex Real‑Estate Transactions
TAL Education Technology
TAL Education Technology
Sep 1, 2020 · Cloud Computing

Cost Optimization and Resource Management in an Online Education Platform: From XEN Migration to Container‑Based Scaling

This article describes how an online education platform reduced infrastructure costs and improved service reliability by replacing XEN with KVM, building resource‑tracking platforms, adopting Kubernetes‑based containerization, implementing rapid auto‑scaling, and establishing systematic resource auditing and standardization processes.

Cost OptimizationInfrastructureKubernetes
0 likes · 25 min read
Cost Optimization and Resource Management in an Online Education Platform: From XEN Migration to Container‑Based Scaling
Qunar Tech Salon
Qunar Tech Salon
Aug 27, 2020 · Databases

Qunar Technology Carnival Interview Series: Insights on Hotel Flow Optimization, Database Architecture, and System Stability

The article presents a series of interviews from Qunar's Technology Carnival, featuring experts Liang Zhangping, Wang Zhufeng, and Zheng Jimin who discuss hotel booking flow improvements, database architecture comparisons and migration to PXC, and comprehensive system stability governance practices.

InfrastructureQunarTechnology Carnival
0 likes · 13 min read
Qunar Technology Carnival Interview Series: Insights on Hotel Flow Optimization, Database Architecture, and System Stability
Cloud Native Technology Community
Cloud Native Technology Community
Aug 25, 2020 · Cloud Native

How Lyft’s Open‑Source Clutch Transforms Cloud‑Native Infrastructure Management

Lyft open‑sourced Clutch, a scalable UI and API platform that unifies infrastructure tooling with built‑in security, authorization, and observability, offering a single binary Go backend and plug‑in React frontend to simplify operations, reduce MTTR, and improve developer experience across large cloud‑native environments.

Control PlaneDevOpsInfrastructure
0 likes · 15 min read
How Lyft’s Open‑Source Clutch Transforms Cloud‑Native Infrastructure Management
Meituan Technology Team
Meituan Technology Team
Aug 13, 2020 · Cloud Native

Meituan’s Migration from OpenStack to Kubernetes: Large‑Scale Cloud‑Native Infrastructure, Challenges and Practices

Meituan migrated its massive cloud infrastructure from OpenStack to Kubernetes, containerizing over 98 % of services and implementing custom scheduling, NUMA‑aware placement, fine‑grained resource isolation, and an internal management platform that boosted stability above 99.99 %, cut costs, and paved the way for unified VM‑container scheduling and broader cloud‑native workloads.

Cloud NativeInfrastructureKubernetes
0 likes · 21 min read
Meituan’s Migration from OpenStack to Kubernetes: Large‑Scale Cloud‑Native Infrastructure, Challenges and Practices
NetEase Media Technology Team
NetEase Media Technology Team
Aug 13, 2020 · Cloud Native

How NetEase Media Scaled Its Infrastructure with Containerization and Service Mesh

NetEase Media transformed its infrastructure by containerizing services, establishing multiple resource pools, implementing a ServiceMesh with NSF, and isolating beta and production environments, resulting in higher CPU utilization, automated scaling, and improved stability, while sharing lessons learned and future plans.

Cloud NativeInfrastructureKubernetes
0 likes · 22 min read
How NetEase Media Scaled Its Infrastructure with Containerization and Service Mesh
MaGe Linux Operations
MaGe Linux Operations
Aug 5, 2020 · Cloud Native

Top Open-Source Tools to Simplify Kubernetes Management Across Any Environment

Discover a curated list of powerful open-source Kubernetes management solutions—including K9s, Rancher, Dashboard, Kubectl, Kubeadm, Helm, KubeSpray, Kontena Lens, and WKSctl—detailing their core features, deployment options, and how they streamline cluster monitoring, configuration, and application lifecycle across cloud-native environments.

Cloud NativeCluster ManagementDevOps
0 likes · 8 min read
Top Open-Source Tools to Simplify Kubernetes Management Across Any Environment
Efficient Ops
Efficient Ops
Jul 28, 2020 · Operations

How Zhejiang Mobile Transformed SRE for Telecom: A Practical Operations Blueprint

This article details Zhejiang Mobile's adaptation of Google‑originated Site Reliability Engineering to a telecom environment, outlining a three‑layer capability framework, standardized processes, integrated platforms, and measurable outcomes that demonstrate how agile SRE practices can boost reliability and scalability in traditional industries.

InfrastructureSRESite Reliability Engineering
0 likes · 11 min read
How Zhejiang Mobile Transformed SRE for Telecom: A Practical Operations Blueprint
21CTO
21CTO
Jul 13, 2020 · Operations

Why Did GitHub Crash? Inside the July 2020 Outage and Its Root Causes

The July 13, 2020 GitHub outage, triggered by load‑balancer misconfiguration, a database connection error during partitioning, and a network‑config mistake, sparked worldwide developer panic, highlighted reliability concerns, and revealed challenges in scaling cloud infrastructure amid the pandemic.

GitHubInfrastructureOutage
0 likes · 6 min read
Why Did GitHub Crash? Inside the July 2020 Outage and Its Root Causes
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 10, 2020 · Operations

iQIYI IPv6 Large‑Scale Deployment: Technical Challenges, Solutions, and Management Practices

iQIYI’s IPv6 rollout, responding to the national deployment plan, coordinated multiple technical teams to redesign its network and introduced the “iQIYI IPv6 Cloud Control” scheme that manages IPv4/IPv6 switching and fallback, reaching more than 200 million active IPv6 users and 800 GB traffic peaks, guided by long‑term strategic value, clear milestones, and engineers’ curiosity to expand IPv6‑driven service quality and cost savings.

IPv6InfrastructureOperations
0 likes · 12 min read
iQIYI IPv6 Large‑Scale Deployment: Technical Challenges, Solutions, and Management Practices
Suning Technology
Suning Technology
Jun 22, 2020 · Operations

How Suning Moved 26,888 Servers in 75 Days – Key Takeaways

Suning’s data center team completed a record-breaking migration of 26,888 servers across 75 days, detailing the planning, tight time windows, intensive communication, cross‑team coordination, risk management, and efficiency gains that enabled zero‑downtime migration and significant cost savings for future operations.

Data centerInfrastructureOperations
0 likes · 7 min read
How Suning Moved 26,888 Servers in 75 Days – Key Takeaways
JD Retail Technology
JD Retail Technology
Jun 5, 2020 · Operations

How JD Cloud Engineered a Seamless 618 Shopping Surge: Ops Strategies & Disaster Drills

This article details JD Cloud's comprehensive operational preparation for the 618 shopping festival, covering early resource procurement, hardware fault management, network and CDN scaling, extensive capacity‑testing, disaster‑recovery drills, and cross‑departmental coordination that together ensured stable service during massive traffic spikes.

Infrastructurecapacity planningcloud operations
0 likes · 8 min read
How JD Cloud Engineered a Seamless 618 Shopping Surge: Ops Strategies & Disaster Drills
Alibaba Cloud Developer
Alibaba Cloud Developer
May 26, 2020 · Cloud Native

Why Serverless Containers Are Shaping the Future of Cloud‑Native Kubernetes

This article examines the rising trend of serverless containers, their application value, architectural design for cloud‑native Kubernetes, key challenges such as startup latency and scalability, and how Alibaba Cloud's Serverless Kubernetes and ECI solutions address these issues while offering a free learning course.

ContainerInfrastructureKubernetes
0 likes · 16 min read
Why Serverless Containers Are Shaping the Future of Cloud‑Native Kubernetes
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Apr 13, 2020 · Backend Development

Essential Backend Infrastructure for Scalable Java Applications

This article outlines the critical backend components required for building robust Java services, covering API gateways, MVC/IOC/ORM frameworks, caching, databases, search engines, message queues, file storage, unified authentication, configuration, service governance, scheduling, logging, data pipelines, and monitoring strategies.

BackendDevOpsInfrastructure
0 likes · 23 min read
Essential Backend Infrastructure for Scalable Java Applications
Open Source Linux
Open Source Linux
Mar 7, 2020 · Cloud Computing

KVM vs XEN: Which Virtualization Technology Powers Modern Cloud Computing?

This article explains how virtualization, especially the open‑source hypervisors KVM and XEN, underpins cloud computing, outlines cloud service and deployment models, compares full and para‑virtualization, and evaluates the strengths and adoption of each technology in today’s major cloud providers.

InfrastructureKVMVirtualization
0 likes · 7 min read
KVM vs XEN: Which Virtualization Technology Powers Modern Cloud Computing?
Open Source Linux
Open Source Linux
Mar 2, 2020 · Operations

Why Use Server Clusters? Benefits, Types, and Choosing the Right Solution

This article explains what a server cluster is, why organizations adopt clusters for performance, cost, scalability and reliability, outlines the main cluster categories such as load‑balancing, high‑availability and HPC, and offers guidance on selecting appropriate software and hardware solutions.

HPCInfrastructureopen source
0 likes · 12 min read
Why Use Server Clusters? Benefits, Types, and Choosing the Right Solution
Efficient Ops
Efficient Ops
Feb 27, 2020 · Operations

Building a Flexible, Scalable CMDB for Ops: Architecture, API, and UI Insights

This article introduces an open‑source, four‑layer CMDB designed for operations teams, detailing its storage, data, API, and UI layers, dynamic modeling capabilities, searchable CI APIs, various resource views, relationship mapping, and role‑based permission management, while providing deployment links and usage notes.

APICMDBConfiguration Management
0 likes · 11 min read
Building a Flexible, Scalable CMDB for Ops: Architecture, API, and UI Insights
DevOps Cloud Academy
DevOps Cloud Academy
Feb 27, 2020 · Operations

Jenkins Infrastructure, Project Management, and Configuration‑as‑Code Overview

This article introduces Jenkins infrastructure setup, including installation via Ansible, Puppet, Chef or Docker, outlines management tools such as CLI, REST API, python‑jenkins and Jenkins‑client, describes project creation plugins like Job DSL, Job Builder and Jenkinsfile, and explains system configuration using Groovy scripts and the Configuration‑as‑Code plugin.

DevOpsInfrastructureJenkins
0 likes · 3 min read
Jenkins Infrastructure, Project Management, and Configuration‑as‑Code Overview
Tencent Tech
Tencent Tech
Jan 17, 2020 · Cloud Computing

How QQ Tackled Massive Cloud Migration Challenges – Tencent’s Strategy Revealed

Tencent’s QQ service migrated over a million servers to public cloud, detailing comprehensive planning, phased execution, and solutions to security, dependency, disaster recovery, and gray‑scale challenges, while highlighting infrastructure upgrades, database migration, cloud‑native tools, and operational transformations that ensured zero user impact.

InfrastructureOperationsQQ
0 likes · 20 min read
How QQ Tackled Massive Cloud Migration Challenges – Tencent’s Strategy Revealed
Efficient Ops
Efficient Ops
Jan 8, 2020 · Operations

How a Bank Built an Automated Operations Platform and CMDB Middle‑Platform

This article details how Ping An Bank tackled rapid growth and complex regulatory demands by creating an automated operations middle‑platform, designing a CMDB with data‑closure and subscription mechanisms, and implementing orchestration, gray‑scale deployment, and high‑risk detection to achieve resilient, scalable infrastructure management.

AutomationCMDBInfrastructure
0 likes · 21 min read
How a Bank Built an Automated Operations Platform and CMDB Middle‑Platform
Tencent Tech
Tencent Tech
Dec 23, 2019 · Cloud Computing

How Tencent Scaled to Over 1 Million Servers and Cut Costs by 30%

Tencent’s 2019 infrastructure breakthrough revealed a million‑plus servers, 100 Tbps network bandwidth, modular data centers, self‑developed hardware and software innovations that together slashed total cost of ownership by 30%, boosted efficiency, and pushed cloud elasticity to new heights.

Data centerInfrastructureTencent
0 likes · 8 min read
How Tencent Scaled to Over 1 Million Servers and Cut Costs by 30%
Youzan Coder
Youzan Coder
Dec 23, 2019 · Mobile Development

Mobile Infrastructure Construction and Practices at Youzan

Youzan’s mobile infrastructure combines enforced pre‑release processes, unified permission workflows, cross‑platform Zan Weex and emerging Flutter support, dynamic configuration, robust CI, logging, testing, and shared component libraries to deliver efficient, high‑quality, gray‑/conditional‑/full releases while fostering collaboration across its mobile development teams.

FlutterInfrastructureWeex
0 likes · 16 min read
Mobile Infrastructure Construction and Practices at Youzan
MaGe Linux Operations
MaGe Linux Operations
Dec 18, 2019 · Operations

Mastering Modern IT Operations: Roles, Practices, and Evolution

This article outlines the comprehensive responsibilities and evolution of IT operations, covering system, application, database, security, and platform management, detailing tasks such as infrastructure building, monitoring, optimization, automation, and the shift from manual processes to self‑scheduling systems.

AutomationIT OperationsInfrastructure
0 likes · 20 min read
Mastering Modern IT Operations: Roles, Practices, and Evolution
Architects' Tech Alliance
Architects' Tech Alliance
Dec 6, 2019 · Cloud Computing

Data Center Modernization and Future Cloud Computing Trends

The article analyzes how enterprises are shifting to cloud platforms, the resulting idle data centers, market forecasts for public and private cloud growth, and proposes modernization strategies—including continuous technology updates, workflow optimization, fault simulation, hybrid deployment, and virtualization—to meet the increasing demand for efficient, scalable infrastructure over the next few years.

DCIMInfrastructureModernization
0 likes · 15 min read
Data Center Modernization and Future Cloud Computing Trends
Tencent Cloud Developer
Tencent Cloud Developer
Nov 21, 2019 · Operations

Serverless Operations: Efficient and Intelligent Cloud-native Practices

The article recaps Tencent Cloud’s Serverless operational suite—covering built‑in DevOps tools, logging, monitoring, auto‑scaling, and security—demonstrating how it replaces manual IaaS provisioning, accelerates development, and enables cloud‑native management, illustrated by a WeChat Mini‑Program album that cut build time from months to two weeks.

AutomationDevOpsInfrastructure
0 likes · 19 min read
Serverless Operations: Efficient and Intelligent Cloud-native Practices
High Availability Architecture
High Availability Architecture
Nov 19, 2019 · Blockchain

How Coinbase Builds and Deploys Blockchain Nodes with Snapchain

The article explains Coinbase’s unique security and infrastructure requirements for blockchain nodes, describes the challenges of blue‑green deployments, and details the Snapchain system built on AWS that enables fast, reliable snapshot‑based node provisioning, upgrades, and high‑availability scaling.

AWSBlockchainInfrastructure
0 likes · 7 min read
How Coinbase Builds and Deploys Blockchain Nodes with Snapchain
Cloud Native Technology Community
Cloud Native Technology Community
Nov 15, 2019 · Cloud Native

Helm 3 Release: Fixing Helm 2’s Flaws and Simplifying Kubernetes Package Management

The November 13 Helm 3 release eliminates Tiller, addresses major Helm 2 shortcomings such as template engine bugs, hook handling, and resource conflicts, and introduces a cleaner architecture that aligns Helm with modern Kubernetes practices while offering new features like multi‑cluster support and dependency checks.

DevOpsInfrastructurehelm
0 likes · 7 min read
Helm 3 Release: Fixing Helm 2’s Flaws and Simplifying Kubernetes Package Management
DataFunTalk
DataFunTalk
Nov 14, 2019 · Artificial Intelligence

Building the Most Reliable Autonomous Driving Infrastructure at Pony.ai

This article outlines Pony.ai's comprehensive autonomous driving infrastructure, describing traditional internet back‑end components, additional vehicle‑mounted systems, large‑scale simulation, data challenges, and the reliability, performance, and flexibility practices needed to support rapid growth and safe robotaxi operations.

AI systemsInfrastructurePony.ai
0 likes · 15 min read
Building the Most Reliable Autonomous Driving Infrastructure at Pony.ai
DevOps Cloud Academy
DevOps Cloud Academy
Nov 9, 2019 · Operations

Configuring Jenkins High Availability with HAProxy and NFS

This guide explains how to achieve Jenkins high availability by deploying two Jenkins master nodes behind HAProxy, sharing Jenkins home via NFS, and configuring HAProxy load balancing and health checks, including detailed host setup, NFS and Jenkins installation steps, and test results.

DevOpsHAProxyInfrastructure
0 likes · 10 min read
Configuring Jenkins High Availability with HAProxy and NFS
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 31, 2019 · R&D Management

Alibaba Infrastructure PMO Wins Best Enterprise Practice Award and Presents Strategic Project Management Framework

The 2019 China Project Management Development 20‑Year Achievement Forum highlighted Alibaba Infrastructure's award for Best Enterprise Practice, where its PMO shared a layered strategic project management framework that translates departmental strategy into executable projects, offering insights for organizations facing rapid market changes.

AlibabaBest Practice AwardCase Study
0 likes · 6 min read
Alibaba Infrastructure PMO Wins Best Enterprise Practice Award and Presents Strategic Project Management Framework
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Sep 20, 2019 · Cloud Computing

Alibaba Announces Open‑Source “Fangsheng” Project for Next‑Generation Cloud Server Architecture

At the ODCC 2019 Open Data Center Conference in Beijing, Alibaba server architect Guo Rui unveiled the open‑source “Fangsheng” project, detailing a new cloud‑server architecture that addresses ultra‑large scale, diverse customer needs, and intense competition by improving cooling, power efficiency, modularity, and deployment flexibility for Chinese cloud data centers.

AlibabaData centerFangsheng
0 likes · 5 min read
Alibaba Announces Open‑Source “Fangsheng” Project for Next‑Generation Cloud Server Architecture
dbaplus Community
dbaplus Community
Sep 18, 2019 · Cloud Computing

Why Hybrid Multi‑Cloud Is the Future of Enterprise IT – Lessons, Pitfalls, and Best Practices

This article explores the origins, architecture, and real‑world implementation experience of enterprise hybrid multi‑cloud, covering security solutions, resource pooling, IaaS/PaaS/SaaS layers, and common challenges such as data placement, network reliability, traffic routing, and standardization.

DevOpsInfrastructurecloud security
0 likes · 13 min read
Why Hybrid Multi‑Cloud Is the Future of Enterprise IT – Lessons, Pitfalls, and Best Practices
ITPUB
ITPUB
Aug 27, 2019 · Cloud Computing

Why OpenStack Is Losing Momentum: A Seven‑Year Retrospective

The author reflects on seven years of OpenStack, highlighting its declining community activity, lack of profitability, ineffective technical committee, poor enterprise value, competition from Kubernetes and PaaS, and argues that technical quality alone cannot reverse its downward trajectory.

InfrastructureKubernetesOpenStack
0 likes · 9 min read
Why OpenStack Is Losing Momentum: A Seven‑Year Retrospective
Efficient Ops
Efficient Ops
Aug 20, 2019 · Operations

What 38 Years of Banking IT Operations Taught a Veteran Engineer

In a two‑hour interview, Zhang Qinglong, China Bank’s data‑center operations chief, recounts his 38‑year journey from the early B20 accounting system to today’s cloud‑driven services, sharing lessons on responsibility, ITSM adoption, essential skills, and the future direction of IT operations.

Banking TechnologyFuture TrendsIT Operations
0 likes · 12 min read
What 38 Years of Banking IT Operations Taught a Veteran Engineer
dbaplus Community
dbaplus Community
Aug 14, 2019 · Cloud Native

What Is the “Container Ops Pattern” and How It Reshapes Kubernetes Management

The article traces the shift from physical‑server deployments to container‑cloud platforms, defines a newly coined “container ops pattern”, explains its core scenarios, compares declarative and imperative workflows, dissects Kubernetes API objects, controllers, and interfaces (CRI, CSI, CNI), and outlines the master‑node architecture that underpins modern cloud‑native operations.

CloudNativeContainerOpsDesignPatterns
0 likes · 23 min read
What Is the “Container Ops Pattern” and How It Reshapes Kubernetes Management
360 Tech Engineering
360 Tech Engineering
Jun 28, 2019 · Operations

Modular Puppet Code: Environments, Modules, and Classes

This article explains how to structure modular Puppet code by configuring environments, creating reusable modules, and designing classes, covering environment paths, hiera data, module generation, publishing to the Forge, and key class functions such as include, require, contain, and hiera_include.

AutomationDevOpsInfrastructure
0 likes · 11 min read
Modular Puppet Code: Environments, Modules, and Classes
21CTO
21CTO
Jun 27, 2019 · Operations

From Hundreds to Thousands: Scaling Operations and Building a Custom Monitoring System

This article recounts AdMaster's five‑year journey from a few dozen servers to thousands, detailing the evolution of their monitoring infrastructure, the challenges faced at each scale stage, and the design of a self‑built, distributed monitoring platform that delivers real‑time alerts, visualized data, and business‑level insights.

InfrastructureOperationsscaling
0 likes · 14 min read
From Hundreds to Thousands: Scaling Operations and Building a Custom Monitoring System
Architects' Tech Alliance
Architects' Tech Alliance
Jun 12, 2019 · Cloud Computing

An Introduction to OpenStack: Origins, Architecture, and Development

This article provides a comprehensive overview of OpenStack, covering its history from early Amazon web services to its open‑source launch, the governance of the OpenStack Foundation, the evolution of its releases, core components, widespread adoption, and practical guidance for learning the platform.

IaaSInfrastructureOpenStack
0 likes · 11 min read
An Introduction to OpenStack: Origins, Architecture, and Development
Efficient Ops
Efficient Ops
May 16, 2019 · Operations

How Alibaba’s AI‑Powered Data Centers Achieve Scalable, Reliable Operations

This article examines Alibaba Cloud’s intelligent data center ecosystem, covering market share, global distribution, operational challenges, AIOps evolution, multi‑layered infrastructure platforms, demand forecasting, fault prediction, and future smart‑automation prospects for large‑scale cloud operations.

Alibaba CloudInfrastructureOperations
0 likes · 13 min read
How Alibaba’s AI‑Powered Data Centers Achieve Scalable, Reliable Operations
DataFunTalk
DataFunTalk
May 10, 2019 · Artificial Intelligence

Pony.ai Infrastructure Overview: Vehicle Systems, Simulation Platform, and Data Architecture

The article presents a comprehensive overview of Pony.ai's autonomous driving infrastructure, covering the core infrastructure team’s responsibilities, vehicle onboard systems, simulation platform, data architecture, and supporting services, while discussing the technical challenges and engineering practices employed to achieve scalability, reliability, and high performance.

AIBig DataInfrastructure
0 likes · 14 min read
Pony.ai Infrastructure Overview: Vehicle Systems, Simulation Platform, and Data Architecture
Architecture Digest
Architecture Digest
May 10, 2019 · Backend Development

Comprehensive Guide to Building a Backend Technology Stack for Startups

This article provides an extensive overview of how startups can design, select, and integrate languages, components, processes, and systems—such as project management, DNS, load balancing, databases, messaging, monitoring, and deployment—to construct a robust and scalable backend architecture.

DevOpsInfrastructurearchitecture
0 likes · 30 min read
Comprehensive Guide to Building a Backend Technology Stack for Startups
Java High-Performance Architecture
Java High-Performance Architecture
Apr 9, 2019 · Operations

Mastering Load Balancing: Types, Algorithms, and Best Practices

This article outlines the three main load‑balancing methods—DNS, hardware, and software—detailing their advantages and drawbacks, then explains common algorithms such as round‑robin, weighted round‑robin, least‑connections, performance‑based, and hash, and provides guidance on combining them for optimal architecture.

AlgorithmsInfrastructureOperations
0 likes · 5 min read
Mastering Load Balancing: Types, Algorithms, and Best Practices
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 27, 2019 · Cloud Native

Why Kubernetes Became the Backbone of Modern Cloud Native Architecture

This article introduces Kubernetes from a beginner’s perspective, covering its historical background, core design principles, architecture components such as Master and Node, key concepts like declarative APIs, containers, Pods, Services, and demonstrates how to create clusters, deploy, scale, and update applications, while also highlighting its role in cloud‑native environments.

DevOpsInfrastructureKubernetes
0 likes · 15 min read
Why Kubernetes Became the Backbone of Modern Cloud Native Architecture
DevOps
DevOps
Feb 26, 2019 · Operations

Planning a DevOps Infrastructure for Traditional Enterprises: Capabilities and Tool Mapping

This article analyzes the essential capabilities required for building a DevOps infrastructure in traditional enterprises across foundation, development, testing, operations, and project management, mapping each capability to representative tools and offering guidance on flexible, evolving architecture design.

DevOpsInfrastructureOperations
0 likes · 12 min read
Planning a DevOps Infrastructure for Traditional Enterprises: Capabilities and Tool Mapping
Programmer DD
Programmer DD
Feb 25, 2019 · R&D Management

Why Chinese Tech Teams Overtime While US Teams Don’t: A Deep Dive

The article examines why Chinese software engineers face severe overtime compared to their U.S. counterparts, analyzing product decision processes, the low technical voice in China, infrastructure shortcomings, and cultural attitudes toward work‑life balance, revealing systemic factors behind the disparity.

InfrastructureOvertimeR&D management
0 likes · 9 min read
Why Chinese Tech Teams Overtime While US Teams Don’t: A Deep Dive
Efficient Ops
Efficient Ops
Jan 23, 2019 · Operations

Designing an Operations Monitoring Platform: Tools & Best Practices

This article explores the essential concepts for selecting and building an operations monitoring platform, reviewing popular tools such as Cacti, Nagios, Zabbix, Ganglia, Centreon, Prometheus, and Grafana, and outlines a six‑layer architecture and practical strategies for scaling, alerting, and high‑availability in diverse environments.

AlertingDevOpsInfrastructure
0 likes · 19 min read
Designing an Operations Monitoring Platform: Tools & Best Practices
Manbang Technology Team
Manbang Technology Team
Dec 27, 2018 · Cloud Native

Cloud‑Native Infrastructure Practices at Manbang Group’s HuoCheBang

This article outlines Manbang Group’s HuoCheBang migration to cloud‑native architecture, detailing the unified monitoring platform Galileo, the Kubernetes‑based container cloud Planck, the microservice environment Newton, and the DevOps platform Solvay that together enable scalable, observable, and efficient service delivery.

DevOpsInfrastructurecontainer orchestration
0 likes · 5 min read
Cloud‑Native Infrastructure Practices at Manbang Group’s HuoCheBang
NetEase Game Operations Platform
NetEase Game Operations Platform
Dec 10, 2018 · Information Security

Understanding and Improving Operations Security: Practices, Risks, and Enterprise‑Level Solutions

This article explains the concept of operations security, why it has become critical, enumerates common mis‑configurations and vulnerabilities such as open ports, weak permissions, insecure scripts and supply‑chain risks, and provides a comprehensive set of best‑practice guidelines and an enterprise‑level framework to build a resilient operations security posture.

AutomationInfrastructureincident response
0 likes · 28 min read
Understanding and Improving Operations Security: Practices, Risks, and Enterprise‑Level Solutions
Xianyu Technology
Xianyu Technology
Dec 6, 2018 · Mobile Development

Rebuilding Flutter Infrastructure at Xianyu: Challenges and Solutions

Xianyu tackled Flutter adoption by creating a private CocoaPods CI pipeline, a component‑based fishRedux architecture, and a shared‑GL engine modification that let native middleware run in Flutter, thereby unifying Android, iOS, and Flutter development, improving build speed, and contributing tools back to the community.

Component ArchitectureDARTFlutter
0 likes · 11 min read
Rebuilding Flutter Infrastructure at Xianyu: Challenges and Solutions
Meituan Technology Team
Meituan Technology Team
Nov 15, 2018 · Cloud Native

Meituan's Container Platform HULK: Architecture and Optimization

Meituan’s HULK platform is a custom container‑cluster manager that integrates service governance, deployment and monitoring, while employing kernel patches, enhanced cgroup controls, dynamic CPU/memory isolation, P2P image distribution and in‑container core‑dump handling to overcome stability, performance and resource‑reporting challenges.

Container TechnologyHULKInfrastructure
0 likes · 20 min read
Meituan's Container Platform HULK: Architecture and Optimization
AntTech
AntTech
Oct 9, 2018 · Cloud Computing

Technical Analysis of OceanBase Cloud Platform (OCP) 2.0 Architecture and Solutions

The article provides a comprehensive technical overview of OceanBase Cloud Platform (OCP) 2.0, detailing its redesigned architecture, reduced deployment complexity, high‑availability features, unified resource scheduling, monitoring, diagnostics, and how these innovations address infrastructure and business challenges while lowering costs.

InfrastructureOCP 2.0OceanBase
0 likes · 11 min read
Technical Analysis of OceanBase Cloud Platform (OCP) 2.0 Architecture and Solutions
Architects' Tech Alliance
Architects' Tech Alliance
Sep 30, 2018 · Industry Insights

What Every Data Center Engineer Must Know About Rack Cabinet Standards and Design

This article provides a comprehensive overview of data‑center rack cabinets, covering size specifications, power and cooling requirements, key industry standards such as IEC 60297‑1 and EIA‑310‑D, structural components, environmental considerations, load capacity, and practical design guidelines for safe and efficient deployment.

Data centerInfrastructureOperations
0 likes · 10 min read
What Every Data Center Engineer Must Know About Rack Cabinet Standards and Design
Architecture Talk
Architecture Talk
Sep 26, 2018 · Backend Development

Essential Backend Infrastructure for Scalable Internet Services: A Complete Guide

This article outlines the critical backend components and best‑practice architectures—including API gateways, load balancers, service frameworks, caching, databases, search engines, messaging, authentication, configuration, scheduling, logging, data pipelines, and monitoring—that together ensure stable, maintainable, and high‑availability services for modern internet companies.

BackendInfrastructureapi-gateway
0 likes · 32 min read
Essential Backend Infrastructure for Scalable Internet Services: A Complete Guide
Efficient Ops
Efficient Ops
Sep 18, 2018 · Operations

Mastering Internet Operations: Roles, Responsibilities, and Evolution

This article provides a comprehensive overview of internet operations, detailing how service‑centric stability, security, and efficiency are achieved through infrastructure management, monitoring, risk mitigation, and continuous optimization, while outlining the various operational roles, their duties, and the evolution of ops practices.

DevOpsInfrastructureOperations
0 likes · 21 min read
Mastering Internet Operations: Roles, Responsibilities, and Evolution
21CTO
21CTO
Aug 30, 2018 · Operations

Inside Google’s Production: How Requests Travel Through Its Massive Infrastructure

Google’s production environment spans a global edge network, massive data centers, sophisticated job scheduling with Borg, distributed storage systems like Bigtable and Spanner, and comprehensive monitoring, illustrating how user requests traverse multiple layers—from ISP to edge, GFE, load balancers, and finally to services.

DeploymentGoogleInfrastructure
0 likes · 9 min read
Inside Google’s Production: How Requests Travel Through Its Massive Infrastructure
Youzan Coder
Youzan Coder
Jul 27, 2018 · Operations

Youzan Testing Environment: Service Chain Isolation and Operational Practices

Youzan created a cost-effective multi-project testing environment by introducing a weakly isolated “service-chain” that propagates identifiers across RPC and REST calls, standardizing entry/exit points, automating provisioning, and integrating the isolated environments into CI/CD pipelines through cross-team collaboration and tooling.

DevOpsInfrastructureIsolation
0 likes · 17 min read
Youzan Testing Environment: Service Chain Isolation and Operational Practices
JD Tech
JD Tech
Jul 25, 2018 · Cloud Native

JD.com’s Large‑Scale Kubernetes Refactoring and Operational Lessons

This article shares JD.com’s extensive experience redesigning Kubernetes for massive production use, covering custom DNS and load‑balancing, scaling clusters to ten‑thousand nodes, adapting controllers, building the Archimedes scheduler, and practical insights on resource isolation, deployment, and high‑traffic elasticity.

Cloud NativeContainerInfrastructure
0 likes · 14 min read
JD.com’s Large‑Scale Kubernetes Refactoring and Operational Lessons
Architects' Tech Alliance
Architects' Tech Alliance
Jul 17, 2018 · Cloud Computing

Why Bare Metal Matters: Understanding OpenStack Ironic for High‑Performance Cloud Deployments

This article explains how OpenStack Ironic enables bare‑metal provisioning as a cloud service, outlines scenarios where physical servers outperform virtual machines, details Ironic’s architecture and evolution, compares billing models, and highlights future trends for bare‑metal cloud offerings.

Bare MetalInfrastructureIronic
0 likes · 9 min read
Why Bare Metal Matters: Understanding OpenStack Ironic for High‑Performance Cloud Deployments
MaGe Linux Operations
MaGe Linux Operations
Jun 29, 2018 · Operations

Essential Skills and Roadmap for Large‑Scale Website Operations Engineers

This comprehensive guide explains what large‑scale website operations entail, outlines the product lifecycle involvement of ops engineers, details the technical and personal skills required, and discusses current challenges, future prospects, and key technologies such as cluster management, monitoring, fault handling, and automation.

AutomationDevOpsInfrastructure
0 likes · 18 min read
Essential Skills and Roadmap for Large‑Scale Website Operations Engineers
Ops Development Stories
Ops Development Stories
Jun 11, 2018 · Cloud Computing

Master OpenStack: Complete Guide to Components and Environment Setup

This article provides a comprehensive overview of OpenStack's architecture, details each core service with its role, and walks through step‑by‑step commands to configure a functional OpenStack Pike environment on CentOS, including networking, database, messaging, and storage components.

InfrastructureInstallationLinux
0 likes · 9 min read
Master OpenStack: Complete Guide to Components and Environment Setup
Qunar Tech Salon
Qunar Tech Salon
May 30, 2018 · Operations

Recap of the QInfrarch Session at the 2018 Qunar Technology Carnival

The QInfrarch special session of the 2018 Qunar Technology Carnival gathered a packed audience on May 27, featuring multiple technical talks on real‑time push architecture, IDC networking, ticket search, decentralization, multi‑datacenter redundancy, and fault‑injection platforms, followed by lively Q&A, networking, and enthusiastic follow‑up requests.

InfrastructureOperationsQInfrarch
0 likes · 4 min read
Recap of the QInfrarch Session at the 2018 Qunar Technology Carnival
ITFLY8 Architecture Home
ITFLY8 Architecture Home
May 27, 2018 · Information Security

How Google Secures Its Global Data Centers: Inside the Infrastructure

Google’s technical infrastructure—supporting services like Search, Gmail, G Suite, and GCP—employs layered physical, hardware, software, and operational security measures, including biometric access, custom secure chips, encrypted boot, service isolation, identity management, and robust DoS defenses to protect data and operations worldwide.

Data Center SecurityGoogleInfrastructure
0 likes · 20 min read
How Google Secures Its Global Data Centers: Inside the Infrastructure
Efficient Ops
Efficient Ops
May 8, 2018 · Operations

20 Proven Ops Automation Rules Every Team Should Follow

This article presents twenty practical principles for building and maintaining an effective, business‑oriented operations automation system, covering mindset, architecture, design, tooling, team composition, data handling, security, and implementation best practices for modern enterprises.

AutomationInfrastructureOperations
0 likes · 5 min read
20 Proven Ops Automation Rules Every Team Should Follow
Efficient Ops
Efficient Ops
Apr 23, 2018 · Operations

Unlocking Ops Automation: Real-World Architectures and Practical Insights

This article explores the essence of operations automation by presenting three real-world platform case studies, analyzing their architectures, tools, and implementation challenges, and then discusses universal automation principles, intelligent ops concepts, and career guidance, blending technical depth with personal motivation.

DeploymentInfrastructureOperations Automation
0 likes · 17 min read
Unlocking Ops Automation: Real-World Architectures and Practical Insights
MaGe Linux Operations
MaGe Linux Operations
Apr 13, 2018 · Operations

How Alibaba Built Its DevOps Automation Platform: Key Practices and Lessons

This article outlines Alibaba's DevOps transformation, describing the three operational stages, four foundations of automated operations, CI/CD implementation, essential system characteristics, development‑defined operations, config‑driven changes, and the tools that enable high‑availability, efficiency, and scalability.

AlibabaAutomationConfiguration
0 likes · 10 min read
How Alibaba Built Its DevOps Automation Platform: Key Practices and Lessons
DevOps Coach
DevOps Coach
Apr 1, 2018 · Cloud Computing

Deploy a Production‑Ready Kubernetes Cluster on AWS with Kops

This step‑by‑step guide shows how to configure Route53 DNS, prepare a VM with required tools, create an S3 state store, provision a Kubernetes cluster on AWS using kops, validate it, expose a sample service, and clean up the resources.

AWSDeploymentInfrastructure
0 likes · 15 min read
Deploy a Production‑Ready Kubernetes Cluster on AWS with Kops
DevOps Coach
DevOps Coach
Mar 29, 2018 · Operations

7 Must-Have Skills Every DevOps Engineer Needs

The article outlines the seven essential competencies—flexibility, security, collaboration, scripting, decision‑making, infrastructure knowledge, and soft skills—that DevOps engineers must master to bridge development and operations, accelerate delivery, and maintain secure, reliable systems.

CollaborationDevOpsInfrastructure
0 likes · 8 min read
7 Must-Have Skills Every DevOps Engineer Needs
Architecture Digest
Architecture Digest
Mar 29, 2018 · Databases

Designing a High‑Availability Redis Service with Sentinel

This article explains how to build a highly available Redis deployment using Redis Sentinel, compares several architectural options, and details the final three‑sentinel design that tolerates node, process, and network failures while keeping client access simple.

Infrastructurefailoverhigh availability
0 likes · 12 min read
Designing a High‑Availability Redis Service with Sentinel
Efficient Ops
Efficient Ops
Mar 27, 2018 · Cloud Computing

Why X86 Bare‑Metal Services Matter and How to Build Them in the Cloud

This article explains why X86 bare‑metal services are essential for high‑performance, security‑critical workloads, describes their architecture and management processes, and outlines the steps—standardization, automation, service‑orientation, and self‑service—used by Hengfeng Bank to implement and operate them.

AutomationBare MetalInfrastructure
0 likes · 16 min read
Why X86 Bare‑Metal Services Matter and How to Build Them in the Cloud
21CTO
21CTO
Mar 19, 2018 · Operations

How Tencent Scaled Its Network from 2004‑2013: Key Lessons in Data‑Center Evolution

This article chronicles Tencent's network journey from its modest 2004 infrastructure through rapid expansion, critical incidents, and architectural breakthroughs like SET zones, SDN, and MPLS VPN, illustrating how the company transformed its data‑center operations to support massive user growth.

Data centerInfrastructureOperations
0 likes · 11 min read
How Tencent Scaled Its Network from 2004‑2013: Key Lessons in Data‑Center Evolution
dbaplus Community
dbaplus Community
Mar 11, 2018 · Cloud Computing

How a Chinese Telecom Payment Platform Mastered Cloud Migration in 8 Hours

This article details the end‑to‑end cloud migration of China Telecom's payment platform, covering pre‑migration challenges, architectural redesign, data‑sync strategies, the eight‑hour cut‑over process, post‑migration performance gains, and future DBaaS plans, all based on a 2017 DBAplus conference talk.

DBaaSInfrastructureOperations
0 likes · 19 min read
How a Chinese Telecom Payment Platform Mastered Cloud Migration in 8 Hours
ITPUB
ITPUB
Mar 9, 2018 · Operations

How to Build Your Own Global CDN Using Smart DNS and Anycast

This guide explains how to create a personal CDN by deploying multiple edge servers, using Geo‑IP‑aware DNS routing, leveraging Amazon Route 53 latency‑based routing, synchronizing content, handling SSL with Let’s Encrypt, and evaluating performance across continents.

AnycastCDNDNS
0 likes · 10 min read
How to Build Your Own Global CDN Using Smart DNS and Anycast
ITPUB
ITPUB
Mar 6, 2018 · Operations

How to Build Your Own Low‑Latency CDN from Scratch

This guide explains why a custom CDN can outperform commercial services, walks through using geo‑aware DNS, BGP Anycast limitations, setting up edge servers, distributing static content, handling SSL certificates, and shares real‑world performance results and lessons learned.

AnycastCDNDNS
0 likes · 11 min read
How to Build Your Own Low‑Latency CDN from Scratch
21CTO
21CTO
Mar 5, 2018 · Cloud Native

How Docker Transforms DevOps: Solving the Multi‑Level Container Challenge

This article explains Docker’s role in modern IT by outlining the challenges faced by companies, comparing VMs and containers, describing Docker’s architecture, and showing how containerization streamlines DevOps workflows and isolates developer and administrator concerns.

Cloud NativeDevOpsDocker
0 likes · 5 min read
How Docker Transforms DevOps: Solving the Multi‑Level Container Challenge
Architecture Digest
Architecture Digest
Feb 28, 2018 · Blockchain

Blockchain Infrastructure Landscape: A First‑Principles Framework

This article presents a first‑principles framework that categorizes blockchain infrastructure components—storage, computation, and communication—by mapping them to concrete projects such as Ethereum, IPFS, BigchainDB, and others, illustrating how these modules interoperate to build efficient decentralized applications.

BlockchainInfrastructuredecentralized storage
0 likes · 21 min read
Blockchain Infrastructure Landscape: A First‑Principles Framework
MaGe Linux Operations
MaGe Linux Operations
Feb 4, 2018 · Operations

Essential Operations Tools Every DevOps Engineer Should Master

This article outlines the key categories of operations tools—including process management, release automation, configuration handling, resource isolation, and comprehensive monitoring and alerting solutions—providing a practical guide for building reliable, automated infrastructure workflows.

AutomationInfrastructureOperations
0 likes · 8 min read
Essential Operations Tools Every DevOps Engineer Should Master
Snowball Engineer Team
Snowball Engineer Team
Feb 2, 2018 · R&D Management

Building an Engineer Culture: Values, Infrastructure, and Incentives at Snowball

The article discusses how Snowball cultivates an engineer‑focused culture by defining core values such as proactiveness, professionalism, efficiency and empathy, establishing robust infrastructure tools, and implementing balanced punishment and reward systems to motivate continuous improvement and retain talent.

InfrastructureR&D managementengineer culture
0 likes · 12 min read
Building an Engineer Culture: Values, Infrastructure, and Incentives at Snowball
ITPUB
ITPUB
Nov 14, 2017 · Operations

How Alibaba’s Dragonfly P2P System Supercharges Large‑Scale File and Container Image Distribution

Alibaba’s Dragonfly (蜻蜓) is a self‑developed P2P file distribution platform that dramatically speeds up massive file and container image delivery, reduces bandwidth consumption, supports intelligent compression and flow control, and has become a core infrastructure component powering billions of transactions during major events like Double 11.

File DistributionInfrastructureP2P
0 likes · 20 min read
How Alibaba’s Dragonfly P2P System Supercharges Large‑Scale File and Container Image Distribution
MaGe Linux Operations
MaGe Linux Operations
Nov 8, 2017 · Operations

How to Build an Ops Engineer Skill Map to Bridge the Hiring Gap

An operations director explains why hiring skilled ops engineers is hard, identifies the technology mismatch in typical stacks, and shares a practical skill‑map approach that lets teams cover most essential tools while giving engineers a clear learning roadmap.

InfrastructureOperationsOps Engineering
0 likes · 3 min read
How to Build an Ops Engineer Skill Map to Bridge the Hiring Gap
Efficient Ops
Efficient Ops
Nov 5, 2017 · Operations

Scaling Ele.me’s Infrastructure: Operations, Automation, and Private Cloud Insights

This article recounts Ele.me's rapid growth from 2014 onward, detailing the challenges of network and server management, the evolution of their operations through standardization, process automation, and platform building, and how private cloud solutions like ZStack enabled fine‑grained, data‑driven infrastructure management.

AutomationInfrastructureOperations
0 likes · 23 min read
Scaling Ele.me’s Infrastructure: Operations, Automation, and Private Cloud Insights
Architecture Digest
Architecture Digest
Oct 27, 2017 · Operations

Key Practices and Principles of DevOps from the “Cloud Development and Operations Best Practices” Talk

The article summarizes a DevOps talk, outlining eight guiding principles—configuration over hard‑coding, redundancy over single points, restartability, whole‑stack delivery, statelessness, standardization, automation, and unattended operation—while sharing concrete tools, architectures, and real‑world experiences from a cloud provider.

AutomationInfrastructureOperations
0 likes · 16 min read
Key Practices and Principles of DevOps from the “Cloud Development and Operations Best Practices” Talk
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 27, 2017 · Operations

How Alibaba Scales DevOps with StarOps: Inside Their Operations Platform

This article explains how Alibaba has evolved its DevOps practice over a decade, detailing the layered architecture of its StarOps suite—including the foundational StarAgent, the Fortress (jump server), the Qingting file‑distribution system, and intelligent AIOps features—showing how automation, scalability, and AI‑driven monitoring enable stable, low‑cost operations for massive workloads such as Double 11.

AutomationInfrastructureaiops
0 likes · 17 min read
How Alibaba Scales DevOps with StarOps: Inside Their Operations Platform
Meitu Technology
Meitu Technology
Sep 28, 2017 · Operations

Inside Meipai’s 3‑D Monitoring System: Scaling 150M Users with Unified Observability

This article examines how Meipai, a popular live‑streaming and short‑video platform with over 150 million monthly active users, engineered a comprehensive, three‑dimensional monitoring architecture that spans client to server, integrates unified dashboards, and leverages both private and public cloud resources to ensure reliable, scalable operations.

DevOpsInfrastructureMeipai
0 likes · 3 min read
Inside Meipai’s 3‑D Monitoring System: Scaling 150M Users with Unified Observability
Architecture Digest
Architecture Digest
Sep 16, 2017 · Backend Development

Essential Backend Infrastructure and Services for Internet Companies

This article outlines the essential backend infrastructure components and best‑practice patterns—such as API gateways, service frameworks, caching, databases, search engines, message queues, authentication, configuration, service governance, scheduling, logging, and monitoring—required to build stable, scalable, and maintainable internet applications.

BackendInfrastructureMicroservices
0 likes · 31 min read
Essential Backend Infrastructure and Services for Internet Companies
Architects' Tech Alliance
Architects' Tech Alliance
Sep 7, 2017 · Industry Insights

How SDN Bridges Networks and Cloud Platforms: An In‑Depth Look

This article explains the relationship between Software‑Defined Networking (SDN) and cloud platforms, detailing cloud service models, OpenStack core services, OpenDaylight controller architecture, and the integration mechanisms that enable unified management of network, compute, and storage resources.

InfrastructureNetwork VirtualizationOpenDaylight
0 likes · 11 min read
How SDN Bridges Networks and Cloud Platforms: An In‑Depth Look
21CTO
21CTO
Sep 6, 2017 · Operations

How JD’s Self‑Built Data Center Achieves Ultra‑Low Energy Use and High Reliability

JD’s self‑constructed data center in Suqian, Jiangsu, combines innovative free‑cooling, high‑efficiency chillers, redundant power architecture, and advanced monitoring to deliver over 180 days of natural cooling, reduce PUE to ≤1.3, and ensure continuous operation with robust backup systems.

CoolingData centerInfrastructure
0 likes · 12 min read
How JD’s Self‑Built Data Center Achieves Ultra‑Low Energy Use and High Reliability