Tagged articles
360 articles
Page 2 of 4
Architects' Tech Alliance
Architects' Tech Alliance
Mar 17, 2024 · Industry Insights

Why Hyper‑Converged Infrastructure Beats Traditional VMware + FC SAN: 4 Key Differences

The article compares hyper‑converged infrastructure with the traditional VMware + FC SAN stack, highlighting four architectural differences and showing how hyper‑convergence improves reliability, concurrency performance, scalability, operational simplicity, and total cost of ownership for modern data‑center workloads.

CostData centerHyper-Converged
0 likes · 8 min read
Why Hyper‑Converged Infrastructure Beats Traditional VMware + FC SAN: 4 Key Differences
Linux Cloud Computing Practice
Linux Cloud Computing Practice
Mar 13, 2024 · Operations

Top 10 Essential Tools Every Operations Engineer Should Master

This article introduces ten indispensable tools for operations engineers, detailing each tool's functionality, typical use cases, key advantages, and real‑world examples, helping professionals streamline automation, monitoring, configuration, and deployment tasks and improve overall system reliability.

InfrastructureOperationsmonitoring
0 likes · 6 min read
Top 10 Essential Tools Every Operations Engineer Should Master
ITPUB
ITPUB
Mar 11, 2024 · Cloud Computing

What 4 Years of Startup Infrastructure Taught Me: AWS, Terraform, GitOps & More

After four years running infrastructure at a fast‑growing startup, the author reviews almost every major decision—from choosing AWS over GCP and adopting EKS, RDS, and Redis, to automating post‑mortems with Slack bots, standardising IaC with Terraform and GitOps, and evaluating SaaS tools like DataDog, PagerDuty, and Notion—highlighting the benefits, regrets, and practical lessons learned.

AWSDevOpsInfrastructure
0 likes · 22 min read
What 4 Years of Startup Infrastructure Taught Me: AWS, Terraform, GitOps & More
Architects' Tech Alliance
Architects' Tech Alliance
Feb 24, 2024 · Operations

How the Two‑Site Three‑Center Disaster Recovery Model Boosts Business Continuity

The article explains the two‑site three‑center disaster‑recovery architecture—comprising a production site, a same‑city backup, and a remote backup—detailing synchronous and asynchronous data replication, failover capabilities, Oracle Data Guard implementation, and why this hybrid approach delivers superior RPO, RTO, and availability for enterprises.

InfrastructureOracle Data GuardRPO
0 likes · 6 min read
How the Two‑Site Three‑Center Disaster Recovery Model Boosts Business Continuity
DevOps Engineer
DevOps Engineer
Feb 1, 2024 · Operations

Overview of Apache Software Foundation Infra Services and Tools

This article provides a comprehensive overview of the Apache Software Foundation's infrastructure services and tools—including website hosting, email, self‑service platforms, version‑control repositories, issue‑tracking systems, CI/CD pipelines, code quality, publishing, virtual machines, and miscellaneous utilities—helping DevOps and SRE engineers understand and leverage Apache's operational ecosystem.

ApacheDevOpsInfrastructure
0 likes · 14 min read
Overview of Apache Software Foundation Infra Services and Tools
Advanced AI Application Practice
Advanced AI Application Practice
Feb 1, 2024 · R&D Management

A Core Roadmap for Effective Quality Assurance

The article outlines a practical roadmap for quality assurance across the software lifecycle, highlighting the pivotal roles of clear requirements, sound technical implementation, risk and project management, and measurable cost metrics, while stressing the need for solid processes and infrastructure.

InfrastructureProject Managementprocess improvement
0 likes · 8 min read
A Core Roadmap for Effective Quality Assurance
Efficient Ops
Efficient Ops
Jan 28, 2024 · Operations

Can One Person Really Manage 40,000 Servers? Real‑World Ops Insights

A collection of Zhihu contributors share practical experiences and opinions on whether a single operations engineer can handle the massive scale of 40,000 servers, covering workload, automation gaps, budgeting, hardware failure rates, and the necessity of team‑based high‑availability practices.

InfrastructureSREScale
0 likes · 9 min read
Can One Person Really Manage 40,000 Servers? Real‑World Ops Insights
Open Source Linux
Open Source Linux
Dec 29, 2023 · Operations

What Are the Core Functions and Evolution of Modern IT Operations?

This article outlines the comprehensive responsibilities of internet operations—including stability, security, efficiency, system and application maintenance, database management, automation, and security—while tracing the historical evolution of operational teams from manual data‑center tasks to sophisticated, self‑scheduling platforms.

DevOpsIT OperationsInfrastructure
0 likes · 18 min read
What Are the Core Functions and Evolution of Modern IT Operations?
DevOps
DevOps
Dec 26, 2023 · Cloud Native

Comprehensive Guide to Cloud‑Native DevOps: Architecture, Tools, and Practical Implementations

This document presents a thorough overview of cloud‑native DevOps, covering the evolution of related technologies, detailed analysis of virtualization, container orchestration, CI/CD pipelines, programming language choices, system architectures, database options, build tools, and five step‑by‑step practice cases that demonstrate end‑to‑end automation, monitoring, and release management in Kubernetes environments.

AutomationDevOpsInfrastructure
0 likes · 35 min read
Comprehensive Guide to Cloud‑Native DevOps: Architecture, Tools, and Practical Implementations
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 6, 2023 · Cloud Computing

Alibaba Cloud DNS Product Upgrade and Ecosystem Strategy at Yunqi Conference 2023

At the Yunqi Conference on October 31, Alibaba Cloud’s DNS team unveiled major upgrades to its cloud DNS product, detailing a unified “one platform, two ends” architecture, multi‑cloud integration, enhanced reliability, security, and ecosystem initiatives, with case studies from Geely, China Communications, and others.

DNSInfrastructuremulti-cloud
0 likes · 9 min read
Alibaba Cloud DNS Product Upgrade and Ecosystem Strategy at Yunqi Conference 2023
Continuous Delivery 2.0
Continuous Delivery 2.0
Oct 20, 2023 · Operations

Understanding Platform Engineering: Definition, Scope, and Its Relationship with DevOps and SRE

The article explains platform engineering as the evolution of DevOps into a productized internal infrastructure function, detailing its definition, target users, responsibilities, organizational placement, differences from DevOps and SRE, industry trends, implementation practices, and criteria for evaluating the need for a dedicated platform team.

IT OperationsInfrastructureSRE
0 likes · 10 min read
Understanding Platform Engineering: Definition, Scope, and Its Relationship with DevOps and SRE
MaGe Linux Operations
MaGe Linux Operations
Oct 13, 2023 · Cloud Native

How Kubernetes Transforms Cloud‑Native Application Deployment and Management

This article explains what Kubernetes (K8s) is, its core features such as portability, scalability and automation, explores enterprise use cases, resource estimation, service migration, deployment evolution, cloud‑native concepts, and details the master‑node architecture and components that enable efficient container orchestration.

Cloud NativeDevOpsInfrastructure
0 likes · 9 min read
How Kubernetes Transforms Cloud‑Native Application Deployment and Management
Java Architect Essentials
Java Architect Essentials
Aug 27, 2023 · Backend Development

Comprehensive List of Backend Development Technologies and Tools

This article provides an extensive, categorized catalog of backend development components—including web containers, databases, caching systems, message queues, load balancers, distributed storage, big‑data frameworks, monitoring, security, testing, and build tools—each with a brief description and official URL for reference.

InfrastructureTechnology Stacktools
0 likes · 11 min read
Comprehensive List of Backend Development Technologies and Tools
Open Source Linux
Open Source Linux
Aug 7, 2023 · Operations

Master Ansible: Architecture, Workflow, and 7 Key Commands

Ansible is a model-driven configuration manager that uses SSH for remote connections, featuring a core engine, modules, plugins, playbooks, connection plugins, and host inventories; this guide explains its architecture, operation flow, and details the seven primary commands with usage examples.

AnsibleConfiguration ManagementDevOps
0 likes · 8 min read
Master Ansible: Architecture, Workflow, and 7 Key Commands
政采云技术
政采云技术
Jul 26, 2023 · Frontend Development

Reflections on Building Infrastructure in Front‑End Development

This article shares practical experiences, challenges, and best‑practice advice for front‑end engineers building reusable infrastructure platforms, emphasizing long‑term planning, cross‑team collaboration, incremental delivery, data‑driven validation, and the importance of balancing low‑level architecture with user‑facing product features.

EngineeringInfrastructurefrontend
0 likes · 16 min read
Reflections on Building Infrastructure in Front‑End Development
vivo Internet Technology
vivo Internet Technology
Jun 28, 2023 · Operations

Certificate Management Platform Practice: From Manual to Platform-Based Operations at Scale

vivo replaced fragile, engineer‑driven certificate handling with a centralized Vue‑2/Go platform that automates application, secure key storage, renewal alerts, and multi‑environment pushes, eliminating availability incidents and paving the way for future blockchain‑based, immutable certificate distribution.

DevOpsInfrastructureOperations Automation
0 likes · 7 min read
Certificate Management Platform Practice: From Manual to Platform-Based Operations at Scale
DevOps
DevOps
Jun 13, 2023 · Operations

Why DevOps Is Not Dead: The Rise of Platform Engineering and Its Impact on Modern Operations

The article argues that DevOps is still alive, explains the shortcomings of isolated operational practices, introduces platform engineering as the next evolution, and discusses practical considerations such as third‑party software selection, cloud‑native adoption, and the role of internal developer platforms in improving organizational efficiency.

Cloud NativeDevOpsInfrastructure
0 likes · 10 min read
Why DevOps Is Not Dead: The Rise of Platform Engineering and Its Impact on Modern Operations
HelloTech
HelloTech
May 23, 2023 · Operations

Introduction to the Haro Monitoring System at the A2M Internet Architecture and AI Summit 2023

At the A2M Internet Architecture and AI Summit 2023 in Shanghai, senior backend expert Zhang Xiaoyong will present Haro’s seven‑year‑old monitoring system, detailing its overall architecture, evolution, application‑level observability practices such as tracing, logging, and metrics, the challenges encountered, and the company’s reflections and future plans.

Infrastructureconference
0 likes · 2 min read
Introduction to the Haro Monitoring System at the A2M Internet Architecture and AI Summit 2023
Bilibili Tech
Bilibili Tech
Apr 25, 2023 · Operations

Liquid Cooling Solutions for Data Center Energy Efficiency: Bilibili's Practice

Bilibili’s next‑generation data center replaces traditional air‑cooling with a hybrid liquid‑cooling system—combining water‑cooled chillers, indirect evaporative cooling, magnetic‑levitation pumps and cold‑plate modules—to raise inlet temperatures, cut fan power, achieve PUE below 1.15, and demonstrate greener, cost‑effective operation while shaping industry standards.

Data centerInfrastructurePUE
0 likes · 12 min read
Liquid Cooling Solutions for Data Center Energy Efficiency: Bilibili's Practice
Ops Development Stories
Ops Development Stories
Apr 12, 2023 · Operations

Essential System Performance Metrics Every Ops Engineer Should Track

This article explains how to categorize and deeply understand key system performance metrics—including infrastructure, application, user experience, and business indicators—so engineers can monitor stability, efficiency, and business impact under high load and concurrency.

InfrastructureOperationsUser experience
0 likes · 10 min read
Essential System Performance Metrics Every Ops Engineer Should Track
Bitu Technology
Bitu Technology
Apr 7, 2023 · Cloud Native

Managing Kubernetes Resource Manifests with Kustomize: Aggregation, Overlays, and Components

This article explains how Tubi’s engineering team uses Kustomize to simplify and scale Kubernetes Resource Manifest management by aggregating resources, applying patches, organizing bases and overlays, and leveraging reusable components to reduce duplication and improve maintainability across clusters and namespaces.

ComponentInfrastructureKubernetes
0 likes · 15 min read
Managing Kubernetes Resource Manifests with Kustomize: Aggregation, Overlays, and Components
Architect's Guide
Architect's Guide
Mar 23, 2023 · Backend Development

A Comprehensive List of 110 Common Components and Frameworks for Backend Development

This article provides a curated collection of 110 widely used technical components and frameworks—ranging from web containers, databases, caches, message queues, load balancers, distributed storage, and monitoring tools to testing utilities and build systems—organized by functional categories to help backend engineers quickly discover suitable solutions.

Infrastructurecomponentsframeworks
0 likes · 12 min read
A Comprehensive List of 110 Common Components and Frameworks for Backend Development
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 16, 2023 · Industry Insights

Why Alibaba’s DAC Strategy Revolutionized Data Center Networking

This article analyzes how Alibaba’s large‑scale deployment of Direct Attach Cables (DAC) transformed data‑center physical networking by cutting costs, reducing power consumption, improving reliability and latency, and driving architectural innovations that address past adoption barriers and future challenges.

AOCAlibabaDAC
0 likes · 19 min read
Why Alibaba’s DAC Strategy Revolutionized Data Center Networking
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 3, 2023 · Cloud Computing

Rethinking Cloud Computing: How Alibaba’s CIPU Redefines Compute Power

This article revisits cloud computing by tracing the evolution of compute power, exploring Alibaba Cloud’s infrastructure breakthroughs such as the CIPU processor and its core platforms, and analyzing how these advances reshape elastic, big‑data, high‑performance, and AI workloads while highlighting trust, cost, and self‑service challenges.

Alibaba CloudCIPUDistributed Systems
0 likes · 32 min read
Rethinking Cloud Computing: How Alibaba’s CIPU Redefines Compute Power
Bilibili Tech
Bilibili Tech
Dec 30, 2022 · Operations

Design and Evolution of Bilibili Intranet DNS Service

The article details Bilibili’s internal DNS service evolution—from an initial BIND9 master‑slave setup to a multi‑level caching architecture that boosts QPS to over 1.5 million—while describing comprehensive host, business, and client monitoring, key configuration pitfalls, and best‑practice recommendations for a low‑latency, reliable intranet DNS.

DNSInfrastructurebind9
0 likes · 10 min read
Design and Evolution of Bilibili Intranet DNS Service
Programmer DD
Programmer DD
Dec 26, 2022 · Operations

Inside Alibaba Cloud Hong Kong Region C Outage: Timeline, Impact, and Lessons Learned

On December 18, 2022, Alibaba Cloud's Hong Kong Region Zone C suffered a massive service interruption—the longest in its operational history—prompting a detailed incident response, extensive service impact across compute, storage, and networking, and a thorough analysis that led to concrete infrastructure and communication improvements.

Alibaba CloudIncident ReportInfrastructure
0 likes · 13 min read
Inside Alibaba Cloud Hong Kong Region C Outage: Timeline, Impact, and Lessons Learned
Architects' Tech Alliance
Architects' Tech Alliance
Dec 6, 2022 · Cloud Computing

Evolution of Hyper‑Converged Infrastructure, Distributed Storage, and Hybrid Cloud

The article examines the development of hyper‑converged infrastructure (HCI), its relationship with software‑defined storage, the shift toward hybrid cloud solutions, hardware choices, and the suitability of distributed storage for critical business workloads, providing a comprehensive overview of modern cloud‑centric storage architectures.

Hyper-ConvergedInfrastructureSDS
0 likes · 12 min read
Evolution of Hyper‑Converged Infrastructure, Distributed Storage, and Hybrid Cloud
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 5, 2022 · Cloud Computing

Highlights of the 2022 China Cloud Computing Infrastructure Summit – Launch of Alibaba Cloud Pangu 2.0 and Expert Insights on Smart Computing Infrastructure

The 2022 China Cloud Computing Infrastructure Summit showcased the debut of Alibaba Cloud Pangu 2.0 and featured a series of expert presentations covering data‑center airflow simulation, AI‑driven compute demands, innovative server and storage technologies, high‑performance networking, and smart‑data‑center operations, concluding with a forward‑looking round‑table on intelligent computing.

Alibaba CloudData centerInfrastructure
0 likes · 10 min read
Highlights of the 2022 China Cloud Computing Infrastructure Summit – Launch of Alibaba Cloud Pangu 2.0 and Expert Insights on Smart Computing Infrastructure
IT Architects Alliance
IT Architects Alliance
Oct 21, 2022 · Cloud Computing

Understanding Hyper‑Converged Infrastructure, Software‑Defined Storage, and Their Role in Hybrid Cloud

The article explains how hyper‑converged infrastructure leverages mature virtualization to provide elastic compute and storage pools, distinguishes it from converged infrastructure, discusses its evolution toward software‑defined storage, and outlines how these technologies integrate with cloud and hybrid‑cloud architectures.

Hyper-ConvergedInfrastructureSoftware-Defined Storage
0 likes · 12 min read
Understanding Hyper‑Converged Infrastructure, Software‑Defined Storage, and Their Role in Hybrid Cloud
ByteDance SE Lab
ByteDance SE Lab
Sep 26, 2022 · Mobile Development

Inside ByteDance’s AppInfra: How Mobile Infrastructure Powers Millions of Apps

This interview reveals how ByteDance’s AppInfra team builds and evolves mobile infrastructure, performance optimization, automated testing, and talent strategies to support a growing portfolio of high‑traffic apps like Douyin and Toutiao, offering insights into cross‑platform toolchains, team organization, and future technology trends.

InfrastructureTeam Organizationapp performance
0 likes · 18 min read
Inside ByteDance’s AppInfra: How Mobile Infrastructure Powers Millions of Apps
Bilibili Tech
Bilibili Tech
Sep 6, 2022 · Operations

Bilibili's Green Data Center Initiatives and the Custom R2‑AZ2 Project Overview

Bilibili’s SYS team is advancing green data‑center technology by aligning with national energy‑efficiency mandates, co‑authoring cold‑plate liquid‑cooling standards, and piloting the custom R2‑AZ2 Innovation Room Phase‑1, which combines indirect evaporative and magnetic‑levitation hybrid cooling to achieve PUEs as low as 1.13 while testing AI‑driven O&M tools for future large‑scale, low‑carbon deployment.

BilibiliInfrastructuredata center operations
0 likes · 9 min read
Bilibili's Green Data Center Initiatives and the Custom R2‑AZ2 Project Overview
Zuoyebang Tech Team
Zuoyebang Tech Team
Aug 26, 2022 · Operations

How We Built a Three‑Layer Stability System for Massive Scale Operations

This article details the operational mindset, stability framework, and transformation journey of the Zuoyebang infrastructure team, covering service lifecycle management, standardization, cloud‑native architecture, multi‑active deployment, incident pre‑plan platforms, traffic scheduling, monitoring, capacity planning, and future directions for SRE service‑orientation.

AutomationInfrastructureOperations
0 likes · 20 min read
How We Built a Three‑Layer Stability System for Massive Scale Operations
Cloud Native Technology Community
Cloud Native Technology Community
Aug 18, 2022 · Operations

Understanding DevOps: Integrating Development and Operations Beyond the ‘Who Develops Who Operates’ Myth

The article clarifies common misconceptions about DevOps, explains that true development‑operations integration relies on dedicated ops teams, automation tools, standardized delivery artifacts, and unified permission management rather than developers performing ops tasks, and highlights Google SRE practices as a practical guide.

AutomationDevOpsInfrastructure
0 likes · 10 min read
Understanding DevOps: Integrating Development and Operations Beyond the ‘Who Develops Who Operates’ Myth
Architect's Guide
Architect's Guide
Aug 18, 2022 · Databases

42 Lessons Learned from Building a Production Database

This article translates and summarizes Mahesh Balakrishnan’s 42 practical insights on building a production database, covering customer focus, project management, design principles, code review, observability, research, and cultural practices for engineering teams.

DesignInfrastructureObservability
0 likes · 11 min read
42 Lessons Learned from Building a Production Database
Architect
Architect
Aug 14, 2022 · Databases

42 Lessons Learned from Building a Production Database – Translated Summary

This article translates Mahesh Balakrishnan’s 42 practical lessons on building a production database, covering customer focus, project management, design principles, code review, strategy, observability, and research, offering actionable guidance for infrastructure engineers and architects.

InfrastructureSoftware Architecturedesign principles
0 likes · 12 min read
42 Lessons Learned from Building a Production Database – Translated Summary
Practical DevOps Architecture
Practical DevOps Architecture
Aug 4, 2022 · Cloud Native

Enterprise Kubernetes Course: Core Technologies, Persistent Storage, Pods, Controllers, Deployments, and Ingress

This article provides a comprehensive list of 64 video lessons covering enterprise‑level Kubernetes topics, including cluster architecture, persistent storage, Pods, Controllers, Deployments, Services, Ingress, Helm, security, monitoring, high‑availability setup, and application deployment, serving as a detailed curriculum for mastering Kubernetes.

Cloud NativeDevOpsInfrastructure
0 likes · 7 min read
Enterprise Kubernetes Course: Core Technologies, Persistent Storage, Pods, Controllers, Deployments, and Ingress
AntTech
AntTech
Jul 30, 2022 · Fundamentals

Open Source Core Infrastructure: Ant Group’s Strategy and Key Projects

In his keynote at the 2022 Open Atom Global Open Source Summit, He Zhengyu outlined Ant Group’s open‑source strategy, highlighting over 900 repositories and key projects such as OceanBase, SOFA Mesh, MOSN, BabaSSL, Occlum, and the upcoming TuGraph, emphasizing how open core infrastructure drives industry innovation and ecosystem growth.

Ant GroupCloud NativeInfrastructure
0 likes · 9 min read
Open Source Core Infrastructure: Ant Group’s Strategy and Key Projects
DevOps
DevOps
Jul 25, 2022 · Operations

Understanding the Role and Responsibilities of Site Reliability Engineering (SRE)

This article provides a comprehensive overview of Site Reliability Engineering, explaining its origins, core responsibilities across infrastructure, platform, and business layers, daily tasks such as deployment, on‑call duties, SLI/SLO management, incident post‑mortems, capacity planning, and user support, as well as career advice for aspiring SREs.

InfrastructureOncallReliability
0 likes · 21 min read
Understanding the Role and Responsibilities of Site Reliability Engineering (SRE)
Top Architect
Top Architect
Jun 29, 2022 · Operations

Understanding DNS Load Balancing, CDN, and SOA Mechanisms

This article explains the limitations of traditional load‑balancing, describes how CDNs and DNS use distributed, hierarchical mechanisms such as SOA to achieve traffic distribution and fault tolerance, and outlines practical DNS‑based load‑balancing implementations and supported service providers.

CDNDNSInfrastructure
0 likes · 7 min read
Understanding DNS Load Balancing, CDN, and SOA Mechanisms
dbaplus Community
dbaplus Community
May 31, 2022 · Operations

How G Bank Scaled Monitoring with Zabbix: Architecture & Automation

Facing soaring business scale, G Bank adopted Zabbix open-source monitoring to cut costs and boost automation, detailing its multi-layer architecture, support for open-source and Xinchuang platforms, diverse data collection methods, alert strategies, and extensive automation that now cover head-office and 39 branch sites.

AutomationInfrastructureZabbix
0 likes · 9 min read
How G Bank Scaled Monitoring with Zabbix: Architecture & Automation
Efficient Ops
Efficient Ops
May 5, 2022 · Operations

What Makes a Modern Data Center Tick? From History to Architecture

This article explains what an Internet Data Center (IDC) is, traces its evolution from early server farms to cloud‑computing era, and details the hardware, power, cooling, networking, and management systems that compose today’s large‑scale data centers.

Data centerIDCInfrastructure
0 likes · 14 min read
What Makes a Modern Data Center Tick? From History to Architecture
IT Architects Alliance
IT Architects Alliance
Apr 17, 2022 · Operations

Understanding the SRE Role: Responsibilities, Types, and Practices

This article explains what Site Reliability Engineering (SRE) is, why it was created, the challenges in hiring SREs, and breaks the role into three layers—Infrastructure, Platform, and Business—detailing their duties, deployment processes, on‑call practices, SLI/SLO management, incident post‑mortems, capacity planning, user support, and career advice.

InfrastructureOncallOperations
0 likes · 21 min read
Understanding the SRE Role: Responsibilities, Types, and Practices
YunZhu Net Technology Team
YunZhu Net Technology Team
Apr 7, 2022 · Operations

RocketMQ Cluster Migration: Issues, Preparation Steps, and Recommended Migration Plans

This article analyzes the problems caused by multiple independent RocketMQ clusters, outlines the current cluster architecture, details pre‑migration preparations, compares two migration schemes (one not recommended and one recommended), and summarizes the benefits of consolidating clusters into a single, well‑managed deployment.

Cluster MigrationDockerInfrastructure
0 likes · 9 min read
RocketMQ Cluster Migration: Issues, Preparation Steps, and Recommended Migration Plans
DevOps
DevOps
Apr 7, 2022 · Operations

Top DevOps Tools: Comprehensive List, Features, and Selection Guide

This article provides an in‑depth overview of DevOps tools, explaining their role in automating software development and operations, and presents a curated list of popular tools with key features, download links, and guidance on choosing the right solution for your team.

AutomationDevOpsInfrastructure
0 likes · 18 min read
Top DevOps Tools: Comprehensive List, Features, and Selection Guide
Baidu Geek Talk
Baidu Geek Talk
Mar 17, 2022 · Cloud Native

Cloud Native: Reshaping Technology Ecosystem and Driving Industry Transformation

Cloud native is rapidly reshaping the technology ecosystem by standardizing infrastructure, enabling serverless development that frees engineers to focus on core business logic, elevating the importance of comprehensive logging, and driving digital transformation across industries toward an advanced, automated Cloud Native 2.0 era.

Infrastructurecloud-nativedigital-transformation
0 likes · 7 min read
Cloud Native: Reshaping Technology Ecosystem and Driving Industry Transformation
Open Source Linux
Open Source Linux
Mar 2, 2022 · Fundamentals

What Is an IDC? A Complete Guide to Data Center Evolution and Architecture

This article explains what an Internet Data Center (IDC) is, outlines its historical development stages, describes its core hardware and supporting infrastructure, and discusses current trends such as cloud computing, modular designs, and green energy initiatives shaping the future of data centers.

Data centerHardwareIDC
0 likes · 13 min read
What Is an IDC? A Complete Guide to Data Center Evolution and Architecture
21CTO
21CTO
Feb 9, 2022 · Operations

Why Roblox’s Three‑Day Outage Happened: Consul Streaming Bug and BoltDB Design Flaw

Roblox’s detailed post‑mortem reveals that a three‑day outage was caused by a Consul streaming bug and a design flaw in BoltDB’s freelist, which together created CPU contention and latency spikes on its massive on‑premises infrastructure, leading the team to disable streaming, add a second data‑center, and redesign their architecture.

BoltDBConsulInfrastructure
0 likes · 9 min read
Why Roblox’s Three‑Day Outage Happened: Consul Streaming Bug and BoltDB Design Flaw
MaGe Linux Operations
MaGe Linux Operations
Jan 26, 2022 · Cloud Native

Unlock Kubernetes Essentials: Pods, Services, Deployments, and Beyond

This article introduces Kubernetes—Google's open‑source container orchestration platform—detailing its core concepts such as Pods, Namespaces, Nodes, Services, Volumes, PersistentVolumes, Deployments, StatefulSets, DaemonSets, Ingress, Jobs, HPA, ServiceAccounts, Secrets, ConfigMaps, and ResourceQuotas, providing practical commands and usage notes for each component.

Cloud NativeDevOpsInfrastructure
0 likes · 18 min read
Unlock Kubernetes Essentials: Pods, Services, Deployments, and Beyond
Qunar Tech Salon
Qunar Tech Salon
Jan 24, 2022 · Databases

Qunar 2021 Technical Salon – Infrastructure Articles Collection (Databases, Operations, Components)

This article compiles the 2021 Qunar Technical Salon infrastructure series, presenting original technical writings on databases, operational practices, and core components, each linked to detailed posts that share real‑world experiences, design guidelines, and performance insights for engineers and practitioners.

DevOpsInfrastructureOperations
0 likes · 7 min read
Qunar 2021 Technical Salon – Infrastructure Articles Collection (Databases, Operations, Components)
IT Architects Alliance
IT Architects Alliance
Dec 31, 2021 · R&D Management

Types of Enterprise Architects and Their Responsibilities

The article outlines seven distinct architect roles—Enterprise, Application, Information, Infrastructure, Integration, Operation, and Systems Engineering—explaining their focus areas, key responsibilities, and how they align IT capabilities with business needs.

Architect RolesIT ArchitectureInformation Management
0 likes · 6 min read
Types of Enterprise Architects and Their Responsibilities
Alibaba Cloud Native
Alibaba Cloud Native
Dec 22, 2021 · Operations

How Alibaba’s ASI Powers Massive Serverless Kubernetes at Scale

This article details Alibaba's Serverless Infrastructure (ASI) built on ACK, explaining its large‑scale Kubernetes architecture, fully managed operations, change‑risk controls, gray‑release pipelines, web‑shell access, taskflow orchestration, node lifecycle management, elasticity, risk mitigation, probing, and self‑healing capabilities that enable reliable cloud‑native services.

Cloud NativeInfrastructureKubernetes
0 likes · 32 min read
How Alibaba’s ASI Powers Massive Serverless Kubernetes at Scale
Cloud Native Technology Community
Cloud Native Technology Community
Dec 21, 2021 · Industry Insights

How the U.S. DoD’s DevSecOps Strategy Shapes Cloud‑Native Adoption

The article examines the U.S. Department of Defense’s DevSecOps initiative, outlining its cloud‑computing challenges, the shift to Kubernetes, Istio and Knative, the creation of a centralized container registry, and the broader lessons for large organizations seeking open‑source, vendor‑neutral cloud‑native transformations.

Cloud NativeDevSecOpsGovernment
0 likes · 8 min read
How the U.S. DoD’s DevSecOps Strategy Shapes Cloud‑Native Adoption
Efficient Ops
Efficient Ops
Dec 13, 2021 · Operations

Why Every Ops Team Needs a Kubernetes Standards Playbook

This article shares practical standards for Kubernetes operations—from infrastructure choices and application packaging to CI/CD tooling—helping teams reduce complexity, improve reliability, and foster continuous learning and sharing in fast‑moving cloud environments.

DevOpsInfrastructureOperations
0 likes · 13 min read
Why Every Ops Team Needs a Kubernetes Standards Playbook
DevOps
DevOps
Dec 9, 2021 · Operations

Six Major Infrastructure and Operations Trends for 2022 According to Gartner

Gartner's 2021 Infrastructure & Operations survey of 96 global IT leaders identifies six critical trends—Just‑In‑Time Infrastructure, Digital Natives, Management Confluence, Data Proliferation, Business Acumen, and Career Lattices—that will shape I&O strategy over the next 12‑18 months.

GartnerIT ManagementInfrastructure
0 likes · 7 min read
Six Major Infrastructure and Operations Trends for 2022 According to Gartner
IT Architects Alliance
IT Architects Alliance
Dec 1, 2021 · Operations

What Does an SRE Actually Do? A Deep Dive into Roles and Practices

This article explains the origins of Site Reliability Engineering, breaks down its three main layers—Infrastructure, Platform, and Business SRE—covers day‑one and day‑2 deployment, on‑call processes, SLI/SLO design, post‑mortems, capacity planning, user support, and offers practical advice for aspiring SREs.

InfrastructureOncallOperations
0 likes · 24 min read
What Does an SRE Actually Do? A Deep Dive into Roles and Practices
Programmer DD
Programmer DD
Nov 27, 2021 · Operations

How Netflix’s Open Connect CDN Powers Seamless Streaming Worldwide

Netflix’s Open Connect CDN, a proprietary content‑delivery network built over a decade, strategically places millions of server copies close to ISPs, uses multiple bitrate replicas, and dynamically shifts content to flash storage, ensuring high‑quality streaming even during peak demand and network outages.

CDNInfrastructureNetflix
0 likes · 12 min read
How Netflix’s Open Connect CDN Powers Seamless Streaming Worldwide
Tencent Cloud Developer
Tencent Cloud Developer
Nov 22, 2021 · Cloud Native

Xiaohongshu Service Mesh Deployment and Aeraki Component Optimization

Join Xiaohongshu’s cloud‑native engineer Wang Chengcheng on November 23 for a 45‑minute talk and Q&A about the company’s Service Mesh evolution, Istio adaptation, Aeraki production optimizations, large‑scale deployment experiences, and practical strategies for deploying community Istio in production.

AerakiCloud NativeInfrastructure
0 likes · 3 min read
Xiaohongshu Service Mesh Deployment and Aeraki Component Optimization
Programmer DD
Programmer DD
Nov 16, 2021 · Operations

What Does an SRE Do? A Practical Guide to Site Reliability Engineering

This article explains the role of Site Reliability Engineering (SRE), its origins at Google, the challenges of hiring, the three-layer model of infrastructure, platform, and business SRE, and provides detailed responsibilities, on‑call practices, SLI/SLO management, capacity planning, and career advice for aspiring SREs.

InfrastructureOncallSLI
0 likes · 23 min read
What Does an SRE Do? A Practical Guide to Site Reliability Engineering
Java Architect Essentials
Java Architect Essentials
Oct 20, 2021 · Backend Development

Fundamentals of Backend Infrastructure for Java Applications

This article provides a comprehensive overview of essential backend infrastructure components for Java-based services, covering API gateways, core frameworks, caching, databases, search engines, message queues, file storage, authentication, service governance, scheduling, logging, monitoring, and fault‑tolerance strategies.

Infrastructureapi-gatewaycaching
0 likes · 24 min read
Fundamentals of Backend Infrastructure for Java Applications
IT Architects Alliance
IT Architects Alliance
Oct 18, 2021 · Cloud Computing

Understanding Hybrid Cloud: Definitions, Types, Architectural Characteristics, and Its Role in New Infrastructure

The article explains hybrid cloud concepts, differentiates it from multi‑cloud, outlines four hybrid cloud forms, describes key architectural traits such as elasticity, scalability and security, and connects hybrid cloud to the broader "new infrastructure" trend driving digital transformation.

InfrastructureNew Infrastructuremulti-cloud
0 likes · 19 min read
Understanding Hybrid Cloud: Definitions, Types, Architectural Characteristics, and Its Role in New Infrastructure
Java Architect Essentials
Java Architect Essentials
Sep 27, 2021 · Industry Insights

What Baidu’s Search Engine Reveals About Infrastructure, Microservices, and Cloud‑Native Design

In this interview, Baidu's chief infrastructure architect explains the high‑performance, data‑intensive demands of its core search business, the role of middle‑platform and microservice architectures, the evolution and impact of cloud‑native technologies, and practical advice for SME architects designing modern IT systems.

BaiduCloud NativeInfrastructure
0 likes · 12 min read
What Baidu’s Search Engine Reveals About Infrastructure, Microservices, and Cloud‑Native Design
NetEase Media Technology Team
NetEase Media Technology Team
Aug 25, 2021 · Cloud Native

NetEase Media Container Platform Construction: Cloud Native Implementation Experience and Best Practices

NetEase Media details its year‑long journey building a cloud‑native container platform—covering foundational concepts, a robust infrastructure framework, Kubernetes deployment, solutions to pre‑containerization challenges, and practical best practices such as graceful shutdowns, health probes, and resource‑limit configurations.

Cloud NativeContainer TechnologyDevOps
0 likes · 32 min read
NetEase Media Container Platform Construction: Cloud Native Implementation Experience and Best Practices
IT Architects Alliance
IT Architects Alliance
Jun 29, 2021 · Operations

Understanding High Availability: Compute and Storage Strategies Explained

This article defines high availability, explains why achieving four nines is a common goal, and categorizes HA into compute and storage solutions, detailing common architectures such as active‑passive, master‑slave, symmetric and asymmetric clusters, as well as various storage replication patterns.

Infrastructurecompute HAhigh availability
0 likes · 3 min read
Understanding High Availability: Compute and Storage Strategies Explained
Architects' Tech Alliance
Architects' Tech Alliance
May 26, 2021 · Operations

Super Data Center Definition, Types, Infrastructure, and Development Trends

This article explains the definition of a super data center, outlines international standards, describes various data‑center categories and four architectural layers, details power‑distribution and cooling subsystems, introduces the PUE metric, and discusses emerging trends and technologies for higher density and lower energy consumption in modern super‑computing facilities.

CoolingInfrastructurePUE
0 likes · 6 min read
Super Data Center Definition, Types, Infrastructure, and Development Trends
High Availability Architecture
High Availability Architecture
May 20, 2021 · Cloud Computing

Improving R&D Efficiency with Serverless: Goals, Obstacles, and Practices

The article discusses how R&D efficiency can be enhanced by adopting Serverless technologies, outlining the goals of efficiency governance, the typical obstacles faced in infrastructure, architecture, and team collaboration, and presenting concrete measures and future trends for cloud‑native development.

InfrastructureR&D efficiencyServerless
0 likes · 14 min read
Improving R&D Efficiency with Serverless: Goals, Obstacles, and Practices
Open Source Linux
Open Source Linux
May 18, 2021 · Operations

How Much Bandwidth Do ByteDance’s Data Centers Actually Have?

This article examines the massive scale of ByteDance’s data centers, detailing server counts, outbound bandwidth estimates, dual‑link designs, and the role of CDN acceleration in delivering smooth video experiences to hundreds of millions of daily users.

ByteDanceCDNInfrastructure
0 likes · 8 min read
How Much Bandwidth Do ByteDance’s Data Centers Actually Have?
JD Cloud Developers
JD Cloud Developers
May 11, 2021 · Operations

How JD.com’s AIDCTwins Digital Twin Transforms Data Center Operations

JD.com’s AIDCTwins platform leverages low‑code modeling, IoT sensing and cross‑platform 3D visualization to create a digital twin of its massive data‑center infrastructure, dramatically cutting labor costs, enabling real‑time updates, and boosting intelligent, green operation across thousands of servers and racks.

Data centerDigital TwinInfrastructure
0 likes · 6 min read
How JD.com’s AIDCTwins Digital Twin Transforms Data Center Operations
UCloud Tech
UCloud Tech
Apr 22, 2021 · Cloud Computing

How Hybrid Cloud Architecture Extends Compute, Storage, and Security

This article explains why many enterprises still rely on on‑premise data centers, introduces three hybrid‑cloud deployment models, and provides detailed solutions for extending computing power, storage backup, security protection, new product capabilities, and smooth business migration using a hybrid cloud approach.

InfrastructureScalabilitySecurity
0 likes · 19 min read
How Hybrid Cloud Architecture Extends Compute, Storage, and Security
JD Cloud Developers
JD Cloud Developers
Apr 16, 2021 · Cloud Native

How JD Retail Leverages Cloud‑Native Architecture: Real‑World Cases

JD Retail’s K8s lead Zhou Guang shares how the company’s shift to cloud‑native infrastructure is reshaping IT operations, detailing the architecture, deployment strategies, and concrete case studies that illustrate performance gains and scalability in a modern retail environment.

Case StudyInfrastructureJD Retail
0 likes · 3 min read
How JD Retail Leverages Cloud‑Native Architecture: Real‑World Cases
MaGe Linux Operations
MaGe Linux Operations
Apr 11, 2021 · Cloud Native

What’s New in Kubernetes 1.21? Explore 51 Enhancements and Core Features

Kubernetes 1.21, the first 2021 release, introduces 51 enhancements—including stable CronJobs, immutable Secrets/ConfigMaps, dual‑stack IPv4/IPv6 support, graceful node shutdown, and new health monitoring for PersistentVolumes—while deprecating PodSecurityPolicy and TopologyKeys and offering a suite of new beta and stable features for cloud‑native workloads.

DevOpsInfrastructureKubernetes
0 likes · 9 min read
What’s New in Kubernetes 1.21? Explore 51 Enhancements and Core Features
Cloud Native Technology Community
Cloud Native Technology Community
Mar 25, 2021 · Operations

What Are the Top DevOps Trends Shaping 2021 and Beyond?

This article analyzes the most influential DevOps trends for 2021, including the rise of DevSecOps, AI‑driven AIOps, infrastructure automation, chaos engineering, serverless adoption, hybrid cloud, GitOps, and edge computing, backed by market forecasts and expert predictions.

CloudNativeDevOpsDevSecOps
0 likes · 10 min read
What Are the Top DevOps Trends Shaping 2021 and Beyond?
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 13, 2021 · Operations

Comprehensive Guide to Monitoring: Objectives, Methods, Tools, and Best Practices

This article provides an in‑depth overview of monitoring, covering its purpose, key objectives, practical methods, core processes, a detailed comparison of popular monitoring tools such as Zabbix and Prometheus, and best‑practice recommendations for building scalable, reliable, and intelligent monitoring platforms.

InfrastructureObservabilityOperations
0 likes · 42 min read
Comprehensive Guide to Monitoring: Objectives, Methods, Tools, and Best Practices
DataFunTalk
DataFunTalk
Feb 20, 2021 · Artificial Intelligence

Challenges and Evolution of Autonomous Driving Infrastructure

This article examines the fundamental architecture of autonomous driving, highlighting the three core technical contradictions—rapid iteration versus functional safety, sensor and compute demands, and hardware performance versus automotive-grade safety—while outlining a staged development roadmap, hardware and software evolution strategies, and the long‑term goal of safe, reliable driverless operation.

AIHardwareInfrastructure
0 likes · 21 min read
Challenges and Evolution of Autonomous Driving Infrastructure
Ops Development Stories
Ops Development Stories
Jan 27, 2021 · Information Security

Secure Secrets: Install & Integrate HashiCorp Vault with Kubernetes

This guide walks through installing HashiCorp Vault on Linux and Kubernetes, configuring it for secret management, enabling Kubernetes authentication, creating policies and roles, and accessing secrets via initContainers or the Vault SDK, providing a complete end‑to‑end secure integration.

DevOpsHashiCorpInfrastructure
0 likes · 13 min read
Secure Secrets: Install & Integrate HashiCorp Vault with Kubernetes
Aikesheng Open Source Community
Aikesheng Open Source Community
Jan 18, 2021 · Databases

How to Build a Professional DBA Operations Team: Infrastructure, Standards, Training, Knowledge Base, and Culture

The article explains how to construct an effective DBA operations team by focusing on reusable infrastructure, clear team standards, a structured training system, a comprehensive knowledge base, and a positive team atmosphere, providing practical tools and methods for each aspect.

DBADatabase operationsInfrastructure
0 likes · 4 min read
How to Build a Professional DBA Operations Team: Infrastructure, Standards, Training, Knowledge Base, and Culture
AntTech
AntTech
Jan 14, 2021 · Cloud Native

Large-Scale Service Mesh Deployment at Ant Group: Practices, Challenges, and Future Outlook

This article details Ant Group's two‑year journey of adopting Service Mesh at massive scale, explaining why Service Mesh is needed for microservice governance, heterogeneous system unification, and financial‑grade security, and describing the architecture, migration strategies, stability mechanisms, operational results, and future directions toward a full mesh and serverless era.

DevOpsInfrastructureMicroservices
0 likes · 17 min read
Large-Scale Service Mesh Deployment at Ant Group: Practices, Challenges, and Future Outlook
Architects' Tech Alliance
Architects' Tech Alliance
Jan 5, 2021 · Operations

Understanding Data Centers: Architecture, Technologies, and Operational Considerations

This article explains what data centers are, outlines their core components—compute, storage, and networking—covers architectural decisions, industry standards, and emerging technologies such as edge computing, micro‑data centers, cloud integration, SDN, HCI, containers, NVMe, and GPU acceleration, highlighting their impact on modern enterprise operations.

Edge ComputingGPUHCI
0 likes · 11 min read
Understanding Data Centers: Architecture, Technologies, and Operational Considerations
Cloud Native Technology Community
Cloud Native Technology Community
Dec 30, 2020 · Operations

Lessons Learned from Two Years of Running Kubernetes in Production

This article recounts a two‑year journey of migrating from Ansible‑managed EC2 deployments to Kubernetes, detailing the motivations, migration strategy, operational challenges, tooling choices, resource management, security, cost considerations, and the development of custom controllers and CRDs to run production workloads reliably.

DevOpsInfrastructureKubernetes
0 likes · 18 min read
Lessons Learned from Two Years of Running Kubernetes in Production
Top Architect
Top Architect
Dec 30, 2020 · Backend Development

Using Kafka as a Storage System for Twitter’s Account Activity Replay API

The article explains how Twitter built the Account Activity Replay API by repurposing Kafka as a storage layer, detailing the system’s architecture, partitioning strategy, request handling, deduplication, and performance optimizations to provide reliable event recovery for developers.

InfrastructureKafkaTwitter
0 likes · 8 min read
Using Kafka as a Storage System for Twitter’s Account Activity Replay API
Didi Tech
Didi Tech
Dec 25, 2020 · Artificial Intelligence

Autonomous Driving Infrastructure: Foundations, Key Trade‑offs, and Evolution Roadmap

The article outlines DiDi’s six‑year autonomous‑driving research, describing the three‑layer hardware‑onboard‑cloud infrastructure, key trade‑offs such as rapid iteration versus functional safety, sensor resolution versus compute, and hardware performance versus automotive‑grade reliability, and presents a staged evolution roadmap toward fully safe, driverless operation.

AIHardwareInfrastructure
0 likes · 22 min read
Autonomous Driving Infrastructure: Foundations, Key Trade‑offs, and Evolution Roadmap
Architects' Tech Alliance
Architects' Tech Alliance
Dec 6, 2020 · Operations

Understanding Data Centers: Architecture, Reliability, and Emerging Technologies

This article explains what a data center is, its core components of compute, storage, and networking, the operational and architectural considerations for reliability and security, and reviews industry standards and emerging technologies such as edge computing, cloud integration, SDN, HCI, containers, NVMe, and GPU acceleration.

Edge ComputingGPUInfrastructure
0 likes · 12 min read
Understanding Data Centers: Architecture, Reliability, and Emerging Technologies
php Courses
php Courses
Nov 10, 2020 · Operations

List of Popular Domestic and Official Open Source Mirror Sites (2020)

This article provides a curated list of widely used domestic and official open‑source software mirror sites for 2020, explaining why mirrors are needed, offering categorized URLs, and giving brief guidance on choosing and using them for faster, more reliable downloads.

ChinaDevelopmentDownload
0 likes · 4 min read
List of Popular Domestic and Official Open Source Mirror Sites (2020)
Efficient Ops
Efficient Ops
Oct 19, 2020 · Operations

Designing an Effective DevOps Operations System: Principles and Practices

This article outlines a comprehensive DevOps operations framework, tracing its evolution from traditional ops to modern automation, detailing business standards, work policies, system integration, and best‑practice norms to achieve high SLA, low cost, and a one‑stop operational platform.

AutomationDevOpsInfrastructure
0 likes · 13 min read
Designing an Effective DevOps Operations System: Principles and Practices