Tagged articles

infrastructure

371 articles · Page 3 of 4
Aikesheng Open Source Community
Aikesheng Open Source Community
Jan 18, 2021 · Databases

How to Build a Professional DBA Operations Team: Infrastructure, Standards, Training, Knowledge Base, and Culture

The article explains how to construct an effective DBA operations team by focusing on reusable infrastructure, clear team standards, a structured training system, a comprehensive knowledge base, and a positive team atmosphere, providing practical tools and methods for each aspect.

DBADatabase operationsKnowledge Base
0 likes · 4 min read
How to Build a Professional DBA Operations Team: Infrastructure, Standards, Training, Knowledge Base, and Culture
AntTech
AntTech
Jan 14, 2021 · Cloud Native

Large-Scale Service Mesh Deployment at Ant Group: Practices, Challenges, and Future Outlook

This article details Ant Group's two‑year journey of adopting Service Mesh at massive scale, explaining why Service Mesh is needed for microservice governance, heterogeneous system unification, and financial‑grade security, and describing the architecture, migration strategies, stability mechanisms, operational results, and future directions toward a full mesh and serverless era.

MicroservicesService Meshdevops
0 likes · 17 min read
Large-Scale Service Mesh Deployment at Ant Group: Practices, Challenges, and Future Outlook
Architects' Tech Alliance
Architects' Tech Alliance
Jan 5, 2021 · Operations

Understanding Data Centers: Architecture, Technologies, and Operational Considerations

This article explains what data centers are, outlines their core components—compute, storage, and networking—covers architectural decisions, industry standards, and emerging technologies such as edge computing, micro‑data centers, cloud integration, SDN, HCI, containers, NVMe, and GPU acceleration, highlighting their impact on modern enterprise operations.

GPUHCIOperations
0 likes · 11 min read
Understanding Data Centers: Architecture, Technologies, and Operational Considerations
Cloud Native Technology Community
Cloud Native Technology Community
Dec 30, 2020 · Operations

Lessons Learned from Two Years of Running Kubernetes in Production

This article recounts a two‑year journey of migrating from Ansible‑managed EC2 deployments to Kubernetes, detailing the motivations, migration strategy, operational challenges, tooling choices, resource management, security, cost considerations, and the development of custom controllers and CRDs to run production workloads reliably.

CI/CDCloudObservability
0 likes · 18 min read
Lessons Learned from Two Years of Running Kubernetes in Production
Top Architect
Top Architect
Dec 30, 2020 · Backend Development

Using Kafka as a Storage System for Twitter’s Account Activity Replay API

The article explains how Twitter built the Account Activity Replay API by repurposing Kafka as a storage layer, detailing the system’s architecture, partitioning strategy, request handling, deduplication, and performance optimizations to provide reliable event recovery for developers.

Twitterinfrastructurekafka
0 likes · 8 min read
Using Kafka as a Storage System for Twitter’s Account Activity Replay API
Didi Tech
Didi Tech
Dec 25, 2020 · Artificial Intelligence

Autonomous Driving Infrastructure: Foundations, Key Trade‑offs, and Evolution Roadmap

The article outlines DiDi’s six‑year autonomous‑driving research, describing the three‑layer hardware‑onboard‑cloud infrastructure, key trade‑offs such as rapid iteration versus functional safety, sensor resolution versus compute, and hardware performance versus automotive‑grade reliability, and presents a staged evolution roadmap toward fully safe, driverless operation.

AIHardwareRoadmap
0 likes · 22 min read
Autonomous Driving Infrastructure: Foundations, Key Trade‑offs, and Evolution Roadmap
Architects' Tech Alliance
Architects' Tech Alliance
Dec 6, 2020 · Operations

Understanding Data Centers: Architecture, Reliability, and Emerging Technologies

This article explains what a data center is, its core components of compute, storage, and networking, the operational and architectural considerations for reliability and security, and reviews industry standards and emerging technologies such as edge computing, cloud integration, SDN, HCI, containers, NVMe, and GPU acceleration.

GPUOperationsedge computing
0 likes · 12 min read
Understanding Data Centers: Architecture, Reliability, and Emerging Technologies
php Courses
php Courses
Nov 10, 2020 · Operations

List of Popular Domestic and Official Open Source Mirror Sites (2020)

This article provides a curated list of widely used domestic and official open‑source software mirror sites for 2020, explaining why mirrors are needed, offering categorized URLs, and giving brief guidance on choosing and using them for faster, more reliable downloads.

ChinaDownloadSoftware Repository
0 likes · 4 min read
List of Popular Domestic and Official Open Source Mirror Sites (2020)
Efficient Ops
Efficient Ops
Oct 19, 2020 · Operations

Designing an Effective DevOps Operations System: Principles and Practices

This article outlines a comprehensive DevOps operations framework, tracing its evolution from traditional ops to modern automation, detailing business standards, work policies, system integration, and best‑practice norms to achieve high SLA, low cost, and a one‑stop operational platform.

AutomationSREbest practices
0 likes · 13 min read
Designing an Effective DevOps Operations System: Principles and Practices
Tencent Cloud Developer
Tencent Cloud Developer
Sep 21, 2020 · Industry Insights

How Beike Guarantees High Availability in Complex Real‑Estate Transactions

This article analyzes Beike's massive real‑estate ecosystem, detailing the intricate business flows, technical architecture, and quality‑assurance challenges, and explains how a suite of internal platforms—KeTest, KeOnes, sosotest, KeDiff, KePTS, and KeMTC—are engineered to deliver high‑performance, highly available services at scale.

MicroservicesTesting Platformsdevops
0 likes · 26 min read
How Beike Guarantees High Availability in Complex Real‑Estate Transactions
TAL Education Technology
TAL Education Technology
Sep 1, 2020 · Cloud Computing

Cost Optimization and Resource Management in an Online Education Platform: From XEN Migration to Container‑Based Scaling

This article describes how an online education platform reduced infrastructure costs and improved service reliability by replacing XEN with KVM, building resource‑tracking platforms, adopting Kubernetes‑based containerization, implementing rapid auto‑scaling, and establishing systematic resource auditing and standardization processes.

Cloud ComputingResource Managementcontainerization
0 likes · 25 min read
Cost Optimization and Resource Management in an Online Education Platform: From XEN Migration to Container‑Based Scaling
Qunar Tech Salon
Qunar Tech Salon
Aug 27, 2020 · Databases

Qunar Technology Carnival Interview Series: Insights on Hotel Flow Optimization, Database Architecture, and System Stability

The article presents a series of interviews from Qunar's Technology Carnival, featuring experts Liang Zhangping, Wang Zhufeng, and Zheng Jimin who discuss hotel booking flow improvements, database architecture comparisons and migration to PXC, and comprehensive system stability governance practices.

QunarTechnology Carnivaldatabase migration
0 likes · 13 min read
Qunar Technology Carnival Interview Series: Insights on Hotel Flow Optimization, Database Architecture, and System Stability
Cloud Native Technology Community
Cloud Native Technology Community
Aug 25, 2020 · Cloud Native

How Lyft’s Open‑Source Clutch Transforms Cloud‑Native Infrastructure Management

Lyft open‑sourced Clutch, a scalable UI and API platform that unifies infrastructure tooling with built‑in security, authorization, and observability, offering a single binary Go backend and plug‑in React frontend to simplify operations, reduce MTTR, and improve developer experience across large cloud‑native environments.

Control PlaneLyftcloud-native
0 likes · 15 min read
How Lyft’s Open‑Source Clutch Transforms Cloud‑Native Infrastructure Management
Meituan Technology Team
Meituan Technology Team
Aug 13, 2020 · Cloud Native

Meituan’s Migration from OpenStack to Kubernetes: Large‑Scale Cloud‑Native Infrastructure, Challenges and Practices

Meituan migrated its massive cloud infrastructure from OpenStack to Kubernetes, containerizing over 98 % of services and implementing custom scheduling, NUMA‑aware placement, fine‑grained resource isolation, and an internal management platform that boosted stability above 99.99 %, cut costs, and paved the way for unified VM‑container scheduling and broader cloud‑native workloads.

Cloud NativeLarge-Scale OperationsMeituan
0 likes · 21 min read
Meituan’s Migration from OpenStack to Kubernetes: Large‑Scale Cloud‑Native Infrastructure, Challenges and Practices
NetEase Media Technology Team
NetEase Media Technology Team
Aug 13, 2020 · Cloud Native

How NetEase Media Scaled Its Infrastructure with Containerization and Service Mesh

NetEase Media transformed its infrastructure by containerizing services, establishing multiple resource pools, implementing a ServiceMesh with NSF, and isolating beta and production environments, resulting in higher CPU utilization, automated scaling, and improved stability, while sharing lessons learned and future plans.

Cloud NativeResource ManagementService Mesh
0 likes · 22 min read
How NetEase Media Scaled Its Infrastructure with Containerization and Service Mesh
MaGe Linux Operations
MaGe Linux Operations
Aug 5, 2020 · Cloud Native

Top Open-Source Tools to Simplify Kubernetes Management Across Any Environment

Discover a curated list of powerful open-source Kubernetes management solutions—including K9s, Rancher, Dashboard, Kubectl, Kubeadm, Helm, KubeSpray, Kontena Lens, and WKSctl—detailing their core features, deployment options, and how they streamline cluster monitoring, configuration, and application lifecycle across cloud-native environments.

Cloud Nativecluster managementdevops
0 likes · 8 min read
Top Open-Source Tools to Simplify Kubernetes Management Across Any Environment
Efficient Ops
Efficient Ops
Jul 28, 2020 · Operations

How Zhejiang Mobile Transformed SRE for Telecom: A Practical Operations Blueprint

This article details Zhejiang Mobile's adaptation of Google‑originated Site Reliability Engineering to a telecom environment, outlining a three‑layer capability framework, standardized processes, integrated platforms, and measurable outcomes that demonstrate how agile SRE practices can boost reliability and scalability in traditional industries.

AgileSRESite Reliability Engineering
0 likes · 11 min read
How Zhejiang Mobile Transformed SRE for Telecom: A Practical Operations Blueprint
21CTO
21CTO
Jul 13, 2020 · Operations

Why Did GitHub Crash? Inside the July 2020 Outage and Its Root Causes

The July 13, 2020 GitHub outage, triggered by load‑balancer misconfiguration, a database connection error during partitioning, and a network‑config mistake, sparked worldwide developer panic, highlighted reliability concerns, and revealed challenges in scaling cloud infrastructure amid the pandemic.

Cloud ComputingGitHubOutage
0 likes · 6 min read
Why Did GitHub Crash? Inside the July 2020 Outage and Its Root Causes
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 10, 2020 · Operations

iQIYI IPv6 Large‑Scale Deployment: Technical Challenges, Solutions, and Management Practices

iQIYI’s IPv6 rollout, responding to the national deployment plan, coordinated multiple technical teams to redesign its network and introduced the “iQIYI IPv6 Cloud Control” scheme that manages IPv4/IPv6 switching and fallback, reaching more than 200 million active IPv6 users and 800 GB traffic peaks, guided by long‑term strategic value, clear milestones, and engineers’ curiosity to expand IPv6‑driven service quality and cost savings.

IPv6Operationscloud control
0 likes · 12 min read
iQIYI IPv6 Large‑Scale Deployment: Technical Challenges, Solutions, and Management Practices
Suning Technology
Suning Technology
Jun 22, 2020 · Operations

How Suning Moved 26,888 Servers in 75 Days – Key Takeaways

Suning’s data center team completed a record-breaking migration of 26,888 servers across 75 days, detailing the planning, tight time windows, intensive communication, cross‑team coordination, risk management, and efficiency gains that enabled zero‑downtime migration and significant cost savings for future operations.

Cloud ComputingData CenterOperations
0 likes · 7 min read
How Suning Moved 26,888 Servers in 75 Days – Key Takeaways
JD Retail Technology
JD Retail Technology
Jun 5, 2020 · Operations

How JD Cloud Engineered a Seamless 618 Shopping Surge: Ops Strategies & Disaster Drills

This article details JD Cloud's comprehensive operational preparation for the 618 shopping festival, covering early resource procurement, hardware fault management, network and CDN scaling, extensive capacity‑testing, disaster‑recovery drills, and cross‑departmental coordination that together ensured stable service during massive traffic spikes.

Disaster Recoverycapacity planningcloud operations
0 likes · 8 min read
How JD Cloud Engineered a Seamless 618 Shopping Surge: Ops Strategies & Disaster Drills
Alibaba Cloud Developer
Alibaba Cloud Developer
May 26, 2020 · Cloud Native

Why Serverless Containers Are Shaping the Future of Cloud‑Native Kubernetes

This article examines the rising trend of serverless containers, their application value, architectural design for cloud‑native Kubernetes, key challenges such as startup latency and scalability, and how Alibaba Cloud's Serverless Kubernetes and ECI solutions address these issues while offering a free learning course.

Serverlesscontainerinfrastructure
0 likes · 16 min read
Why Serverless Containers Are Shaping the Future of Cloud‑Native Kubernetes
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Apr 13, 2020 · Backend Development

Essential Backend Infrastructure for Scalable Java Applications

This article outlines the critical backend components required for building robust Java services, covering API gateways, MVC/IOC/ORM frameworks, caching, databases, search engines, message queues, file storage, unified authentication, configuration, service governance, scheduling, logging, data pipelines, and monitoring strategies.

JavaMicroservicesapi-gateway
0 likes · 23 min read
Essential Backend Infrastructure for Scalable Java Applications
Open Source Linux
Open Source Linux
Mar 7, 2020 · Cloud Computing

KVM vs XEN: Which Virtualization Technology Powers Modern Cloud Computing?

This article explains how virtualization, especially the open‑source hypervisors KVM and XEN, underpins cloud computing, outlines cloud service and deployment models, compares full and para‑virtualization, and evaluates the strengths and adoption of each technology in today’s major cloud providers.

Cloud ComputingKVMhypervisor
0 likes · 7 min read
KVM vs XEN: Which Virtualization Technology Powers Modern Cloud Computing?
Open Source Linux
Open Source Linux
Mar 2, 2020 · Operations

Why Use Server Clusters? Benefits, Types, and Choosing the Right Solution

This article explains what a server cluster is, why organizations adopt clusters for performance, cost, scalability and reliability, outlines the main cluster categories such as load‑balancing, high‑availability and HPC, and offers guidance on selecting appropriate software and hardware solutions.

HPCinfrastructureopen source
0 likes · 12 min read
Why Use Server Clusters? Benefits, Types, and Choosing the Right Solution
Efficient Ops
Efficient Ops
Feb 27, 2020 · Operations

Building a Flexible, Scalable CMDB for Ops: Architecture, API, and UI Insights

This article introduces an open‑source, four‑layer CMDB designed for operations teams, detailing its storage, data, API, and UI layers, dynamic modeling capabilities, searchable CI APIs, various resource views, relationship mapping, and role‑based permission management, while providing deployment links and usage notes.

APICMDBconfiguration management
0 likes · 11 min read
Building a Flexible, Scalable CMDB for Ops: Architecture, API, and UI Insights
DevOps Cloud Academy
DevOps Cloud Academy
Feb 27, 2020 · Operations

Jenkins Infrastructure, Project Management, and Configuration‑as‑Code Overview

This article introduces Jenkins infrastructure setup, including installation via Ansible, Puppet, Chef or Docker, outlines management tools such as CLI, REST API, python‑jenkins and Jenkins‑client, describes project creation plugins like Job DSL, Job Builder and Jenkinsfile, and explains system configuration using Groovy scripts and the Configuration‑as‑Code plugin.

CI/CDJenkinsOperations
0 likes · 3 min read
Jenkins Infrastructure, Project Management, and Configuration‑as‑Code Overview
Tencent Tech
Tencent Tech
Jan 17, 2020 · Cloud Computing

How QQ Tackled Massive Cloud Migration Challenges – Tencent’s Strategy Revealed

Tencent’s QQ service migrated over a million servers to public cloud, detailing comprehensive planning, phased execution, and solutions to security, dependency, disaster recovery, and gray‑scale challenges, while highlighting infrastructure upgrades, database migration, cloud‑native tools, and operational transformations that ensured zero user impact.

Cloud MigrationOperationsQQ
0 likes · 20 min read
How QQ Tackled Massive Cloud Migration Challenges – Tencent’s Strategy Revealed
Efficient Ops
Efficient Ops
Jan 8, 2020 · Operations

How a Bank Built an Automated Operations Platform and CMDB Middle‑Platform

This article details how Ping An Bank tackled rapid growth and complex regulatory demands by creating an automated operations middle‑platform, designing a CMDB with data‑closure and subscription mechanisms, and implementing orchestration, gray‑scale deployment, and high‑risk detection to achieve resilient, scalable infrastructure management.

AutomationCMDBOperations
0 likes · 21 min read
How a Bank Built an Automated Operations Platform and CMDB Middle‑Platform
Tencent Tech
Tencent Tech
Dec 23, 2019 · Cloud Computing

How Tencent Scaled to Over 1 Million Servers and Cut Costs by 30%

Tencent’s 2019 infrastructure breakthrough revealed a million‑plus servers, 100 Tbps network bandwidth, modular data centers, self‑developed hardware and software innovations that together slashed total cost of ownership by 30%, boosted efficiency, and pushed cloud elasticity to new heights.

Cloud ComputingData CenterTencent
0 likes · 8 min read
How Tencent Scaled to Over 1 Million Servers and Cut Costs by 30%
Youzan Coder
Youzan Coder
Dec 23, 2019 · Mobile Development

Mobile Infrastructure Construction and Practices at Youzan

Youzan’s mobile infrastructure combines enforced pre‑release processes, unified permission workflows, cross‑platform Zan Weex and emerging Flutter support, dynamic configuration, robust CI, logging, testing, and shared component libraries to deliver efficient, high‑quality, gray‑/conditional‑/full releases while fostering collaboration across its mobile development teams.

CI/CDFlutterWeex
0 likes · 16 min read
Mobile Infrastructure Construction and Practices at Youzan
MaGe Linux Operations
MaGe Linux Operations
Dec 18, 2019 · Operations

Mastering Modern IT Operations: Roles, Practices, and Evolution

This article outlines the comprehensive responsibilities and evolution of IT operations, covering system, application, database, security, and platform management, detailing tasks such as infrastructure building, monitoring, optimization, automation, and the shift from manual processes to self‑scheduling systems.

AutomationIT OperationsMonitoring
0 likes · 20 min read
Mastering Modern IT Operations: Roles, Practices, and Evolution
Architects' Tech Alliance
Architects' Tech Alliance
Dec 6, 2019 · Cloud Computing

Data Center Modernization and Future Cloud Computing Trends

The article analyzes how enterprises are shifting to cloud platforms, the resulting idle data centers, market forecasts for public and private cloud growth, and proposes modernization strategies—including continuous technology updates, workflow optimization, fault simulation, hybrid deployment, and virtualization—to meet the increasing demand for efficient, scalable infrastructure over the next few years.

Cloud ComputingDCIMMarket Forecast
0 likes · 15 min read
Data Center Modernization and Future Cloud Computing Trends
Tencent Cloud Developer
Tencent Cloud Developer
Nov 21, 2019 · Operations

Serverless Operations: Efficient and Intelligent Cloud-native Practices

The article recaps Tencent Cloud’s Serverless operational suite—covering built‑in DevOps tools, logging, monitoring, auto‑scaling, and security—demonstrating how it replaces manual IaaS provisioning, accelerates development, and enables cloud‑native management, illustrated by a WeChat Mini‑Program album that cut build time from months to two weeks.

AutomationServerlessTencent Cloud
0 likes · 19 min read
Serverless Operations: Efficient and Intelligent Cloud-native Practices
High Availability Architecture
High Availability Architecture
Nov 19, 2019 · Blockchain

How Coinbase Builds and Deploys Blockchain Nodes with Snapchain

The article explains Coinbase’s unique security and infrastructure requirements for blockchain nodes, describes the challenges of blue‑green deployments, and details the Snapchain system built on AWS that enables fast, reliable snapshot‑based node provisioning, upgrades, and high‑availability scaling.

AWSHigh AvailabilityNode Deployment
0 likes · 7 min read
How Coinbase Builds and Deploys Blockchain Nodes with Snapchain
Cloud Native Technology Community
Cloud Native Technology Community
Nov 15, 2019 · Cloud Native

Helm 3 Release: Fixing Helm 2’s Flaws and Simplifying Kubernetes Package Management

The November 13 Helm 3 release eliminates Tiller, addresses major Helm 2 shortcomings such as template engine bugs, hook handling, and resource conflicts, and introduces a cleaner architecture that aligns Helm with modern Kubernetes practices while offering new features like multi‑cluster support and dependency checks.

devopshelmhelm3
0 likes · 7 min read
Helm 3 Release: Fixing Helm 2’s Flaws and Simplifying Kubernetes Package Management
DataFunTalk
DataFunTalk
Nov 14, 2019 · Artificial Intelligence

Building the Most Reliable Autonomous Driving Infrastructure at Pony.ai

This article outlines Pony.ai's comprehensive autonomous driving infrastructure, describing traditional internet back‑end components, additional vehicle‑mounted systems, large‑scale simulation, data challenges, and the reliability, performance, and flexibility practices needed to support rapid growth and safe robotaxi operations.

AI SystemsPony.aiSimulation
0 likes · 15 min read
Building the Most Reliable Autonomous Driving Infrastructure at Pony.ai
DevOps Cloud Academy
DevOps Cloud Academy
Nov 9, 2019 · Operations

Configuring Jenkins High Availability with HAProxy and NFS

This guide explains how to achieve Jenkins high availability by deploying two Jenkins master nodes behind HAProxy, sharing Jenkins home via NFS, and configuring HAProxy load balancing and health checks, including detailed host setup, NFS and Jenkins installation steps, and test results.

CI/CDHAProxyHigh Availability
0 likes · 10 min read
Configuring Jenkins High Availability with HAProxy and NFS
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 31, 2019 · R&D Management

Alibaba Infrastructure PMO Wins Best Enterprise Practice Award and Presents Strategic Project Management Framework

The 2019 China Project Management Development 20‑Year Achievement Forum highlighted Alibaba Infrastructure's award for Best Enterprise Practice, where its PMO shared a layered strategic project management framework that translates departmental strategy into executable projects, offering insights for organizations facing rapid market changes.

AlibabaBest Practice AwardCase Study
0 likes · 6 min read
Alibaba Infrastructure PMO Wins Best Enterprise Practice Award and Presents Strategic Project Management Framework
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Sep 20, 2019 · Cloud Computing

Alibaba Announces Open‑Source “Fangsheng” Project for Next‑Generation Cloud Server Architecture

At the ODCC 2019 Open Data Center Conference in Beijing, Alibaba server architect Guo Rui unveiled the open‑source “Fangsheng” project, detailing a new cloud‑server architecture that addresses ultra‑large scale, diverse customer needs, and intense competition by improving cooling, power efficiency, modularity, and deployment flexibility for Chinese cloud data centers.

AlibabaData CenterFangsheng
0 likes · 5 min read
Alibaba Announces Open‑Source “Fangsheng” Project for Next‑Generation Cloud Server Architecture
dbaplus Community
dbaplus Community
Sep 18, 2019 · Cloud Computing

Why Hybrid Multi‑Cloud Is the Future of Enterprise IT – Lessons, Pitfalls, and Best Practices

This article explores the origins, architecture, and real‑world implementation experience of enterprise hybrid multi‑cloud, covering security solutions, resource pooling, IaaS/PaaS/SaaS layers, and common challenges such as data placement, network reliability, traffic routing, and standardization.

Enterprise ArchitectureHybrid CloudMulti-Cloud
0 likes · 13 min read
Why Hybrid Multi‑Cloud Is the Future of Enterprise IT – Lessons, Pitfalls, and Best Practices
ITPUB
ITPUB
Aug 27, 2019 · Cloud Computing

Why OpenStack Is Losing Momentum: A Seven‑Year Retrospective

The author reflects on seven years of OpenStack, highlighting its declining community activity, lack of profitability, ineffective technical committee, poor enterprise value, competition from Kubernetes and PaaS, and argues that technical quality alone cannot reverse its downward trajectory.

Cloud ComputingOpenStackPaaS
0 likes · 9 min read
Why OpenStack Is Losing Momentum: A Seven‑Year Retrospective
Efficient Ops
Efficient Ops
Aug 20, 2019 · Operations

What 38 Years of Banking IT Operations Taught a Veteran Engineer

In a two‑hour interview, Zhang Qinglong, China Bank’s data‑center operations chief, recounts his 38‑year journey from the early B20 accounting system to today’s cloud‑driven services, sharing lessons on responsibility, ITSM adoption, essential skills, and the future direction of IT operations.

Banking TechnologyIT OperationsIT Service Management
0 likes · 12 min read
What 38 Years of Banking IT Operations Taught a Veteran Engineer
dbaplus Community
dbaplus Community
Aug 14, 2019 · Cloud Native

What Is the “Container Ops Pattern” and How It Reshapes Kubernetes Management

The article traces the shift from physical‑server deployments to container‑cloud platforms, defines a newly coined “container ops pattern”, explains its core scenarios, compares declarative and imperative workflows, dissects Kubernetes API objects, controllers, and interfaces (CRI, CSI, CNI), and outlines the master‑node architecture that underpins modern cloud‑native operations.

CloudNativeContainerOpsDesignPatterns
0 likes · 23 min read
What Is the “Container Ops Pattern” and How It Reshapes Kubernetes Management
360 Tech Engineering
360 Tech Engineering
Jun 28, 2019 · Operations

Modular Puppet Code: Environments, Modules, and Classes

This article explains how to structure modular Puppet code by configuring environments, creating reusable modules, and designing classes, covering environment paths, hiera data, module generation, publishing to the Forge, and key class functions such as include, require, contain, and hiera_include.

AutomationModulesPuppet
0 likes · 11 min read
Modular Puppet Code: Environments, Modules, and Classes
21CTO
21CTO
Jun 27, 2019 · Operations

From Hundreds to Thousands: Scaling Operations and Building a Custom Monitoring System

This article recounts AdMaster's five‑year journey from a few dozen servers to thousands, detailing the evolution of their monitoring infrastructure, the challenges faced at each scale stage, and the design of a self‑built, distributed monitoring platform that delivers real‑time alerts, visualized data, and business‑level insights.

Operationsinfrastructurescaling
0 likes · 14 min read
From Hundreds to Thousands: Scaling Operations and Building a Custom Monitoring System
Architects' Tech Alliance
Architects' Tech Alliance
Jun 12, 2019 · Cloud Computing

An Introduction to OpenStack: Origins, Architecture, and Development

This article provides a comprehensive overview of OpenStack, covering its history from early Amazon web services to its open‑source launch, the governance of the OpenStack Foundation, the evolution of its releases, core components, widespread adoption, and practical guidance for learning the platform.

IaaSOpenStackinfrastructure
0 likes · 11 min read
An Introduction to OpenStack: Origins, Architecture, and Development
Efficient Ops
Efficient Ops
May 16, 2019 · Operations

How Alibaba’s AI‑Powered Data Centers Achieve Scalable, Reliable Operations

This article examines Alibaba Cloud’s intelligent data center ecosystem, covering market share, global distribution, operational challenges, AIOps evolution, multi‑layered infrastructure platforms, demand forecasting, fault prediction, and future smart‑automation prospects for large‑scale cloud operations.

AIOpsAlibaba CloudOperations
0 likes · 13 min read
How Alibaba’s AI‑Powered Data Centers Achieve Scalable, Reliable Operations
DataFunTalk
DataFunTalk
May 10, 2019 · Artificial Intelligence

Pony.ai Infrastructure Overview: Vehicle Systems, Simulation Platform, and Data Architecture

The article presents a comprehensive overview of Pony.ai's autonomous driving infrastructure, covering the core infrastructure team’s responsibilities, vehicle onboard systems, simulation platform, data architecture, and supporting services, while discussing the technical challenges and engineering practices employed to achieve scalability, reliability, and high performance.

AIBig DataSimulation
0 likes · 14 min read
Pony.ai Infrastructure Overview: Vehicle Systems, Simulation Platform, and Data Architecture
Architecture Digest
Architecture Digest
May 10, 2019 · Backend Development

Comprehensive Guide to Building a Backend Technology Stack for Startups

This article provides an extensive overview of how startups can design, select, and integrate languages, components, processes, and systems—such as project management, DNS, load balancing, databases, messaging, monitoring, and deployment—to construct a robust and scalable backend architecture.

architecturecomponentsdevops
0 likes · 30 min read
Comprehensive Guide to Building a Backend Technology Stack for Startups
Java High-Performance Architecture
Java High-Performance Architecture
Apr 9, 2019 · Operations

Mastering Load Balancing: Types, Algorithms, and Best Practices

This article outlines the three main load‑balancing methods—DNS, hardware, and software—detailing their advantages and drawbacks, then explains common algorithms such as round‑robin, weighted round‑robin, least‑connections, performance‑based, and hash, and provides guidance on combining them for optimal architecture.

Network ArchitectureOperationsalgorithms
0 likes · 5 min read
Mastering Load Balancing: Types, Algorithms, and Best Practices
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 27, 2019 · Cloud Native

Why Kubernetes Became the Backbone of Modern Cloud Native Architecture

This article introduces Kubernetes from a beginner’s perspective, covering its historical background, core design principles, architecture components such as Master and Node, key concepts like declarative APIs, containers, Pods, Services, and demonstrates how to create clusters, deploy, scale, and update applications, while also highlighting its role in cloud‑native environments.

Container OrchestrationMicroservicesdevops
0 likes · 15 min read
Why Kubernetes Became the Backbone of Modern Cloud Native Architecture
DevOps
DevOps
Feb 26, 2019 · Operations

Planning a DevOps Infrastructure for Traditional Enterprises: Capabilities and Tool Mapping

This article analyzes the essential capabilities required for building a DevOps infrastructure in traditional enterprises across foundation, development, testing, operations, and project management, mapping each capability to representative tools and offering guidance on flexible, evolving architecture design.

Operationsdevopsinfrastructure
0 likes · 12 min read
Planning a DevOps Infrastructure for Traditional Enterprises: Capabilities and Tool Mapping
Programmer DD
Programmer DD
Feb 25, 2019 · R&D Management

Why Chinese Tech Teams Overtime While US Teams Don’t: A Deep Dive

The article examines why Chinese software engineers face severe overtime compared to their U.S. counterparts, analyzing product decision processes, the low technical voice in China, infrastructure shortcomings, and cultural attitudes toward work‑life balance, revealing systemic factors behind the disparity.

OvertimeR&D Managementinfrastructure
0 likes · 9 min read
Why Chinese Tech Teams Overtime While US Teams Don’t: A Deep Dive
Efficient Ops
Efficient Ops
Jan 23, 2019 · Operations

Designing an Operations Monitoring Platform: Tools & Best Practices

This article explores the essential concepts for selecting and building an operations monitoring platform, reviewing popular tools such as Cacti, Nagios, Zabbix, Ganglia, Centreon, Prometheus, and Grafana, and outlines a six‑layer architecture and practical strategies for scaling, alerting, and high‑availability in diverse environments.

AlertingMonitoringOperations
0 likes · 19 min read
Designing an Operations Monitoring Platform: Tools & Best Practices
Manbang Technology Team
Manbang Technology Team
Dec 27, 2018 · Cloud Native

Cloud‑Native Infrastructure Practices at Manbang Group’s HuoCheBang

This article outlines Manbang Group’s HuoCheBang migration to cloud‑native architecture, detailing the unified monitoring platform Galileo, the Kubernetes‑based container cloud Planck, the microservice environment Newton, and the DevOps platform Solvay that together enable scalable, observable, and efficient service delivery.

Container Orchestrationdevopsinfrastructure
0 likes · 5 min read
Cloud‑Native Infrastructure Practices at Manbang Group’s HuoCheBang
NetEase Game Operations Platform
NetEase Game Operations Platform
Dec 10, 2018 · Information Security

Understanding and Improving Operations Security: Practices, Risks, and Enterprise‑Level Solutions

This article explains the concept of operations security, why it has become critical, enumerates common mis‑configurations and vulnerabilities such as open ports, weak permissions, insecure scripts and supply‑chain risks, and provides a comprehensive set of best‑practice guidelines and an enterprise‑level framework to build a resilient operations security posture.

Automationincident responseinfrastructure
0 likes · 28 min read
Understanding and Improving Operations Security: Practices, Risks, and Enterprise‑Level Solutions
Xianyu Technology
Xianyu Technology
Dec 6, 2018 · Mobile Development

Rebuilding Flutter Infrastructure at Xianyu: Challenges and Solutions

Xianyu tackled Flutter adoption by creating a private CocoaPods CI pipeline, a component‑based fishRedux architecture, and a shared‑GL engine modification that let native middleware run in Flutter, thereby unifying Android, iOS, and Flutter development, improving build speed, and contributing tools back to the community.

CI/CDComponent ArchitectureCross-Platform
0 likes · 11 min read
Rebuilding Flutter Infrastructure at Xianyu: Challenges and Solutions
Meituan Technology Team
Meituan Technology Team
Nov 15, 2018 · Cloud Native

Meituan's Container Platform HULK: Architecture and Optimization

Meituan’s HULK platform is a custom container‑cluster manager that integrates service governance, deployment and monitoring, while employing kernel patches, enhanced cgroup controls, dynamic CPU/memory isolation, P2P image distribution and in‑container core‑dump handling to overcome stability, performance and resource‑reporting challenges.

Cloud ComputingContainer TechnologyHULK
0 likes · 20 min read
Meituan's Container Platform HULK: Architecture and Optimization
AntTech
AntTech
Oct 9, 2018 · Cloud Computing

Technical Analysis of OceanBase Cloud Platform (OCP) 2.0 Architecture and Solutions

The article provides a comprehensive technical overview of OceanBase Cloud Platform (OCP) 2.0, detailing its redesigned architecture, reduced deployment complexity, high‑availability features, unified resource scheduling, monitoring, diagnostics, and how these innovations address infrastructure and business challenges while lowering costs.

High AvailabilityOCP 2.0OceanBase
0 likes · 11 min read
Technical Analysis of OceanBase Cloud Platform (OCP) 2.0 Architecture and Solutions
Architects' Tech Alliance
Architects' Tech Alliance
Sep 30, 2018 · Industry Insights

What Every Data Center Engineer Must Know About Rack Cabinet Standards and Design

This article provides a comprehensive overview of data‑center rack cabinets, covering size specifications, power and cooling requirements, key industry standards such as IEC 60297‑1 and EIA‑310‑D, structural components, environmental considerations, load capacity, and practical design guidelines for safe and efficient deployment.

Data CenterOperationsRack Cabinet
0 likes · 10 min read
What Every Data Center Engineer Must Know About Rack Cabinet Standards and Design
Architecture Talk
Architecture Talk
Sep 26, 2018 · Backend Development

Essential Backend Infrastructure for Scalable Internet Services: A Complete Guide

This article outlines the critical backend components and best‑practice architectures—including API gateways, load balancers, service frameworks, caching, databases, search engines, messaging, authentication, configuration, scheduling, logging, data pipelines, and monitoring—that together ensure stable, maintainable, and high‑availability services for modern internet companies.

API GatewayCachingbackend
0 likes · 32 min read
Essential Backend Infrastructure for Scalable Internet Services: A Complete Guide
Efficient Ops
Efficient Ops
Sep 18, 2018 · Operations

Mastering Internet Operations: Roles, Responsibilities, and Evolution

This article provides a comprehensive overview of internet operations, detailing how service‑centric stability, security, and efficiency are achieved through infrastructure management, monitoring, risk mitigation, and continuous optimization, while outlining the various operational roles, their duties, and the evolution of ops practices.

Operationsdevopsinfrastructure
0 likes · 21 min read
Mastering Internet Operations: Roles, Responsibilities, and Evolution
21CTO
21CTO
Aug 30, 2018 · Operations

Inside Google’s Production: How Requests Travel Through Its Massive Infrastructure

Google’s production environment spans a global edge network, massive data centers, sophisticated job scheduling with Borg, distributed storage systems like Bigtable and Spanner, and comprehensive monitoring, illustrating how user requests traverse multiple layers—from ISP to edge, GFE, load balancers, and finally to services.

GoogleMonitoringSRE
0 likes · 9 min read
Inside Google’s Production: How Requests Travel Through Its Massive Infrastructure
Youzan Coder
Youzan Coder
Jul 27, 2018 · Operations

Youzan Testing Environment: Service Chain Isolation and Operational Practices

Youzan created a cost-effective multi-project testing environment by introducing a weakly isolated “service-chain” that propagates identifiers across RPC and REST calls, standardizing entry/exit points, automating provisioning, and integrating the isolated environments into CI/CD pipelines through cross-team collaboration and tooling.

Service Chaindevopsinfrastructure
0 likes · 17 min read
Youzan Testing Environment: Service Chain Isolation and Operational Practices
JD Tech
JD Tech
Jul 25, 2018 · Cloud Native

JD.com’s Large‑Scale Kubernetes Refactoring and Operational Lessons

This article shares JD.com’s extensive experience redesigning Kubernetes for massive production use, covering custom DNS and load‑balancing, scaling clusters to ten‑thousand nodes, adapting controllers, building the Archimedes scheduler, and practical insights on resource isolation, deployment, and high‑traffic elasticity.

Cloud NativeJDOSLarge-scale
0 likes · 14 min read
JD.com’s Large‑Scale Kubernetes Refactoring and Operational Lessons
Architects' Tech Alliance
Architects' Tech Alliance
Jul 17, 2018 · Cloud Computing

Why Bare Metal Matters: Understanding OpenStack Ironic for High‑Performance Cloud Deployments

This article explains how OpenStack Ironic enables bare‑metal provisioning as a cloud service, outlines scenarios where physical servers outperform virtual machines, details Ironic’s architecture and evolution, compares billing models, and highlights future trends for bare‑metal cloud offerings.

Bare MetalCloud ComputingIronic
0 likes · 9 min read
Why Bare Metal Matters: Understanding OpenStack Ironic for High‑Performance Cloud Deployments
MaGe Linux Operations
MaGe Linux Operations
Jun 29, 2018 · Operations

Essential Skills and Roadmap for Large‑Scale Website Operations Engineers

This comprehensive guide explains what large‑scale website operations entail, outlines the product lifecycle involvement of ops engineers, details the technical and personal skills required, and discusses current challenges, future prospects, and key technologies such as cluster management, monitoring, fault handling, and automation.

AutomationLarge ScaleLinux
0 likes · 18 min read
Essential Skills and Roadmap for Large‑Scale Website Operations Engineers
Ops Development Stories
Ops Development Stories
Jun 11, 2018 · Cloud Computing

Master OpenStack: Complete Guide to Components and Environment Setup

This article provides a comprehensive overview of OpenStack's architecture, details each core service with its role, and walks through step‑by‑step commands to configure a functional OpenStack Pike environment on CentOS, including networking, database, messaging, and storage components.

Cloud ComputingInstallationLinux
0 likes · 9 min read
Master OpenStack: Complete Guide to Components and Environment Setup
Qunar Tech Salon
Qunar Tech Salon
May 30, 2018 · Operations

Recap of the QInfrarch Session at the 2018 Qunar Technology Carnival

The QInfrarch special session of the 2018 Qunar Technology Carnival gathered a packed audience on May 27, featuring multiple technical talks on real‑time push architecture, IDC networking, ticket search, decentralization, multi‑datacenter redundancy, and fault‑injection platforms, followed by lively Q&A, networking, and enthusiastic follow‑up requests.

OperationsQInfrarchTech Conference
0 likes · 4 min read
Recap of the QInfrarch Session at the 2018 Qunar Technology Carnival
ITFLY8 Architecture Home
ITFLY8 Architecture Home
May 27, 2018 · Information Security

How Google Secures Its Global Data Centers: Inside the Infrastructure

Google’s technical infrastructure—supporting services like Search, Gmail, G Suite, and GCP—employs layered physical, hardware, software, and operational security measures, including biometric access, custom secure chips, encrypted boot, service isolation, identity management, and robust DoS defenses to protect data and operations worldwide.

Data Center SecurityGoogleOperations
0 likes · 20 min read
How Google Secures Its Global Data Centers: Inside the Infrastructure
Efficient Ops
Efficient Ops
May 8, 2018 · Operations

20 Proven Ops Automation Rules Every Team Should Follow

This article presents twenty practical principles for building and maintaining an effective, business‑oriented operations automation system, covering mindset, architecture, design, tooling, team composition, data handling, security, and implementation best practices for modern enterprises.

AutomationOperationsbest practices
0 likes · 5 min read
20 Proven Ops Automation Rules Every Team Should Follow
Efficient Ops
Efficient Ops
Apr 23, 2018 · Operations

Unlocking Ops Automation: Real-World Architectures and Practical Insights

This article explores the essence of operations automation by presenting three real-world platform case studies, analyzing their architectures, tools, and implementation challenges, and then discusses universal automation principles, intelligent ops concepts, and career guidance, blending technical depth with personal motivation.

MonitoringOperations AutomationSaltStack
0 likes · 17 min read
Unlocking Ops Automation: Real-World Architectures and Practical Insights
MaGe Linux Operations
MaGe Linux Operations
Apr 13, 2018 · Operations

How Alibaba Built Its DevOps Automation Platform: Key Practices and Lessons

This article outlines Alibaba's DevOps transformation, describing the three operational stages, four foundations of automated operations, CI/CD implementation, essential system characteristics, development‑defined operations, config‑driven changes, and the tools that enable high‑availability, efficiency, and scalability.

AlibabaAutomationConfiguration
0 likes · 10 min read
How Alibaba Built Its DevOps Automation Platform: Key Practices and Lessons
DevOps Coach
DevOps Coach
Apr 1, 2018 · Cloud Computing

Deploy a Production‑Ready Kubernetes Cluster on AWS with Kops

This step‑by‑step guide shows how to configure Route53 DNS, prepare a VM with required tools, create an S3 state store, provision a Kubernetes cluster on AWS using kops, validate it, expose a sample service, and clean up the resources.

AWSCloud Computingdeployment
0 likes · 15 min read
Deploy a Production‑Ready Kubernetes Cluster on AWS with Kops
DevOps Coach
DevOps Coach
Mar 29, 2018 · Operations

7 Must-Have Skills Every DevOps Engineer Needs

The article outlines the seven essential competencies—flexibility, security, collaboration, scripting, decision‑making, infrastructure knowledge, and soft skills—that DevOps engineers must master to bridge development and operations, accelerate delivery, and maintain secure, reliable systems.

OperationsScriptingSkills
0 likes · 8 min read
7 Must-Have Skills Every DevOps Engineer Needs
Architecture Digest
Architecture Digest
Mar 29, 2018 · Databases

Designing a High‑Availability Redis Service with Sentinel

This article explains how to build a highly available Redis deployment using Redis Sentinel, compares several architectural options, and details the final three‑sentinel design that tolerates node, process, and network failures while keeping client access simple.

High AvailabilitySentinelfailover
0 likes · 12 min read
Designing a High‑Availability Redis Service with Sentinel
Efficient Ops
Efficient Ops
Mar 27, 2018 · Cloud Computing

Why X86 Bare‑Metal Services Matter and How to Build Them in the Cloud

This article explains why X86 bare‑metal services are essential for high‑performance, security‑critical workloads, describes their architecture and management processes, and outlines the steps—standardization, automation, service‑orientation, and self‑service—used by Hengfeng Bank to implement and operate them.

AutomationBare MetalCloud Computing
0 likes · 16 min read
Why X86 Bare‑Metal Services Matter and How to Build Them in the Cloud
21CTO
21CTO
Mar 19, 2018 · Operations

How Tencent Scaled Its Network from 2004‑2013: Key Lessons in Data‑Center Evolution

This article chronicles Tencent's network journey from its modest 2004 infrastructure through rapid expansion, critical incidents, and architectural breakthroughs like SET zones, SDN, and MPLS VPN, illustrating how the company transformed its data‑center operations to support massive user growth.

Data CenterNetwork ArchitectureOperations
0 likes · 11 min read
How Tencent Scaled Its Network from 2004‑2013: Key Lessons in Data‑Center Evolution
dbaplus Community
dbaplus Community
Mar 11, 2018 · Cloud Computing

How a Chinese Telecom Payment Platform Mastered Cloud Migration in 8 Hours

This article details the end‑to‑end cloud migration of China Telecom's payment platform, covering pre‑migration challenges, architectural redesign, data‑sync strategies, the eight‑hour cut‑over process, post‑migration performance gains, and future DBaaS plans, all based on a 2017 DBAplus conference talk.

Cloud MigrationDBaaSOperations
0 likes · 19 min read
How a Chinese Telecom Payment Platform Mastered Cloud Migration in 8 Hours
ITPUB
ITPUB
Mar 9, 2018 · Operations

How to Build Your Own Global CDN Using Smart DNS and Anycast

This guide explains how to create a personal CDN by deploying multiple edge servers, using Geo‑IP‑aware DNS routing, leveraging Amazon Route 53 latency‑based routing, synchronizing content, handling SSL with Let’s Encrypt, and evaluating performance across continents.

AnycastCDNDNS
0 likes · 10 min read
How to Build Your Own Global CDN Using Smart DNS and Anycast
ITPUB
ITPUB
Mar 6, 2018 · Operations

How to Build Your Own Low‑Latency CDN from Scratch

This guide explains why a custom CDN can outperform commercial services, walks through using geo‑aware DNS, BGP Anycast limitations, setting up edge servers, distributing static content, handling SSL certificates, and shares real‑world performance results and lessons learned.

AnycastCDNDNS
0 likes · 11 min read
How to Build Your Own Low‑Latency CDN from Scratch
21CTO
21CTO
Mar 5, 2018 · Cloud Native

How Docker Transforms DevOps: Solving the Multi‑Level Container Challenge

This article explains Docker’s role in modern IT by outlining the challenges faced by companies, comparing VMs and containers, describing Docker’s architecture, and showing how containerization streamlines DevOps workflows and isolates developer and administrator concerns.

Cloud NativeDockercontainerization
0 likes · 5 min read
How Docker Transforms DevOps: Solving the Multi‑Level Container Challenge
Architecture Digest
Architecture Digest
Feb 28, 2018 · Blockchain

Blockchain Infrastructure Landscape: A First‑Principles Framework

This article presents a first‑principles framework that categorizes blockchain infrastructure components—storage, computation, and communication—by mapping them to concrete projects such as Ethereum, IPFS, BigchainDB, and others, illustrating how these modules interoperate to build efficient decentralized applications.

Distributed Computingblockchaindecentralized storage
0 likes · 21 min read
Blockchain Infrastructure Landscape: A First‑Principles Framework
MaGe Linux Operations
MaGe Linux Operations
Feb 4, 2018 · Operations

Essential Operations Tools Every DevOps Engineer Should Master

This article outlines the key categories of operations tools—including process management, release automation, configuration handling, resource isolation, and comprehensive monitoring and alerting solutions—providing a practical guide for building reliable, automated infrastructure workflows.

AutomationMonitoringOperations
0 likes · 8 min read
Essential Operations Tools Every DevOps Engineer Should Master
Snowball Engineer Team
Snowball Engineer Team
Feb 2, 2018 · R&D Management

Building an Engineer Culture: Values, Infrastructure, and Incentives at Snowball

The article discusses how Snowball cultivates an engineer‑focused culture by defining core values such as proactiveness, professionalism, efficiency and empathy, establishing robust infrastructure tools, and implementing balanced punishment and reward systems to motivate continuous improvement and retain talent.

R&D Managementengineer cultureincentives
0 likes · 12 min read
Building an Engineer Culture: Values, Infrastructure, and Incentives at Snowball
dbaplus Community
dbaplus Community
Jan 8, 2018 · Operations

From Firefighter to Automation: Tencent’s Ops Veteran Shares 10‑Year Infrastructure Secrets

Veteran Tencent operations leader Zhao Jianchun recounts a decade of managing 100,000 servers, detailing the L5 fault‑tolerant system, unified framework, resource packaging, CMDB virtual imaging, and an automated deployment platform that together cut daily incidents by up to 90% and boosted efficiency tenfold.

AutomationCMDBfault tolerance
0 likes · 11 min read
From Firefighter to Automation: Tencent’s Ops Veteran Shares 10‑Year Infrastructure Secrets
JD Tech
JD Tech
Nov 30, 2017 · Artificial Intelligence

Interview with JD Infrastructure Chief Architect He Xiaofeng on Real‑time Computing and Product Data Mining

He Xiaofeng, JD Mall Infrastructure chief architect, discusses his role in building a real‑time computing platform, applying streaming frameworks, machine learning, and knowledge‑graph techniques to product data mining, improve search accuracy, and outline future research directions.

JD.comKnowledge GraphReal-Time Computing
0 likes · 5 min read
Interview with JD Infrastructure Chief Architect He Xiaofeng on Real‑time Computing and Product Data Mining