Tagged articles

infrastructure

371 articles · Page 4 of 4
ITPUB
ITPUB
Nov 14, 2017 · Operations

How Alibaba’s Dragonfly P2P System Supercharges Large‑Scale File and Container Image Distribution

Alibaba’s Dragonfly (蜻蜓) is a self‑developed P2P file distribution platform that dramatically speeds up massive file and container image delivery, reduces bandwidth consumption, supports intelligent compression and flow control, and has become a core infrastructure component powering billions of transactions during major events like Double 11.

File DistributionLarge ScaleP2P
0 likes · 20 min read
How Alibaba’s Dragonfly P2P System Supercharges Large‑Scale File and Container Image Distribution
MaGe Linux Operations
MaGe Linux Operations
Nov 8, 2017 · Operations

How to Build an Ops Engineer Skill Map to Bridge the Hiring Gap

An operations director explains why hiring skilled ops engineers is hard, identifies the technology mismatch in typical stacks, and shares a practical skill‑map approach that lets teams cover most essential tools while giving engineers a clear learning roadmap.

OperationsOps EngineeringSkill Map
0 likes · 3 min read
How to Build an Ops Engineer Skill Map to Bridge the Hiring Gap
Efficient Ops
Efficient Ops
Nov 5, 2017 · Operations

Scaling Ele.me’s Infrastructure: Operations, Automation, and Private Cloud Insights

This article recounts Ele.me's rapid growth from 2014 onward, detailing the challenges of network and server management, the evolution of their operations through standardization, process automation, and platform building, and how private cloud solutions like ZStack enabled fine‑grained, data‑driven infrastructure management.

AutomationCloud ComputingOperations
0 likes · 23 min read
Scaling Ele.me’s Infrastructure: Operations, Automation, and Private Cloud Insights
Architecture Digest
Architecture Digest
Oct 27, 2017 · Operations

Key Practices and Principles of DevOps from the “Cloud Development and Operations Best Practices” Talk

The article summarizes a DevOps talk, outlining eight guiding principles—configuration over hard‑coding, redundancy over single points, restartability, whole‑stack delivery, statelessness, standardization, automation, and unattended operation—while sharing concrete tools, architectures, and real‑world experiences from a cloud provider.

AutomationCloudMonitoring
0 likes · 16 min read
Key Practices and Principles of DevOps from the “Cloud Development and Operations Best Practices” Talk
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 27, 2017 · Operations

How Alibaba Scales DevOps with StarOps: Inside Their Operations Platform

This article explains how Alibaba has evolved its DevOps practice over a decade, detailing the layered architecture of its StarOps suite—including the foundational StarAgent, the Fortress (jump server), the Qingting file‑distribution system, and intelligent AIOps features—showing how automation, scalability, and AI‑driven monitoring enable stable, low‑cost operations for massive workloads such as Double 11.

AIOpsAutomationCloud Computing
0 likes · 17 min read
How Alibaba Scales DevOps with StarOps: Inside Their Operations Platform
Meitu Technology
Meitu Technology
Sep 28, 2017 · Operations

Inside Meipai’s 3‑D Monitoring System: Scaling 150M Users with Unified Observability

This article examines how Meipai, a popular live‑streaming and short‑video platform with over 150 million monthly active users, engineered a comprehensive, three‑dimensional monitoring architecture that spans client to server, integrates unified dashboards, and leverages both private and public cloud resources to ensure reliable, scalable operations.

CloudMeipaiMonitoring
0 likes · 3 min read
Inside Meipai’s 3‑D Monitoring System: Scaling 150M Users with Unified Observability
Architecture Digest
Architecture Digest
Sep 16, 2017 · Backend Development

Essential Backend Infrastructure and Services for Internet Companies

This article outlines the essential backend infrastructure components and best‑practice patterns—such as API gateways, service frameworks, caching, databases, search engines, message queues, authentication, configuration, service governance, scheduling, logging, and monitoring—required to build stable, scalable, and maintainable internet applications.

CachingMicroservicesMonitoring
0 likes · 31 min read
Essential Backend Infrastructure and Services for Internet Companies
Architects' Tech Alliance
Architects' Tech Alliance
Sep 7, 2017 · Industry Insights

How SDN Bridges Networks and Cloud Platforms: An In‑Depth Look

This article explains the relationship between Software‑Defined Networking (SDN) and cloud platforms, detailing cloud service models, OpenStack core services, OpenDaylight controller architecture, and the integration mechanisms that enable unified management of network, compute, and storage resources.

Cloud ComputingNetwork VirtualizationOpenDaylight
0 likes · 11 min read
How SDN Bridges Networks and Cloud Platforms: An In‑Depth Look
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Aug 17, 2017 · Cloud Computing

Alibaba Tech Open-Day in Silicon Valley Showcases Global Infrastructure and Cloud Computing Innovations

The Alibaba Tech Open-Day held in Silicon Valley highlighted the company's global data‑center network, energy‑efficient designs, high‑speed networking, custom hardware, advanced system software, middleware solutions, and its ambitious NASA research program, while also recruiting top engineering talent for both US and China operations.

AlibabaData CentersSilicon Valley
0 likes · 12 min read
Alibaba Tech Open-Day in Silicon Valley Showcases Global Infrastructure and Cloud Computing Innovations
Efficient Ops
Efficient Ops
Aug 13, 2017 · Operations

22 Essential Ops Manager Tips for Building Resilient Web Infrastructure

This article compiles 22 practical recommendations from an operations manager covering domain management, CDN usage, image servers, data center selection, monitoring, security, redundancy, high‑availability architecture, disaster‑recovery planning, and team coordination to help ensure stable and secure online services.

Disaster RecoveryMonitoringOperations
0 likes · 12 min read
22 Essential Ops Manager Tips for Building Resilient Web Infrastructure
ITPUB
ITPUB
Jul 25, 2017 · Operations

How to Accurately Plan Data Center Power with Dell’s Enterprise Infrastructure Planning Tool

This guide explains why precise power usage assessment is crucial for data‑center safety and efficiency, introduces Dell’s free online Enterprise Infrastructure Planning Tool, provides the web and download links, and walks through step‑by‑step configuration of voltage, devices, PSU selection, summary view, and exporting results to PDF or Excel.

ConfigurationData CenterDell
0 likes · 6 min read
How to Accurately Plan Data Center Power with Dell’s Enterprise Infrastructure Planning Tool
Ctrip Technology
Ctrip Technology
Jul 20, 2017 · Operations

Ctrip's Fourth‑Generation Architecture: Elastic Routing (SLB) and the TARS Release System

This article reviews Ctrip's two‑year architecture transformation, describing how the company replaced hardware load balancers with a software‑defined SLB, introduced application‑level grouping, multi‑update mechanisms, health‑check sharing, monitoring, and the TARS release platform to achieve faster, more reliable deployments.

CtripOperationsSLB
0 likes · 16 min read
Ctrip's Fourth‑Generation Architecture: Elastic Routing (SLB) and the TARS Release System
Efficient Ops
Efficient Ops
Jul 12, 2017 · Operations

How Alibaba Built a Scalable DevOps Platform: Lessons for Modern Operations

This article, based on a DevOpsDays Beijing talk, details Alibaba's post‑DevOps transformation, outlining the three evolution stages of operations, the four pillars of automated ops, the importance of CMDB, CI/CD pipelines, and the design of the ATOM platform that enables rapid, data‑driven, and resilient service delivery.

CI/CDCMDBdevops
0 likes · 15 min read
How Alibaba Built a Scalable DevOps Platform: Lessons for Modern Operations
MaGe Linux Operations
MaGe Linux Operations
Jun 29, 2017 · Operations

Mastering Internet Operations: Roles, Responsibilities, and Evolution

This article outlines the service‑centric approach of internet operations, detailing how stability, security, and efficiency are achieved through infrastructure management, system and application maintenance, database administration, and security practices, and traces the evolution of operational roles from manual handling to automated, self‑scheduling platforms.

infrastructure
0 likes · 20 min read
Mastering Internet Operations: Roles, Responsibilities, and Evolution
Efficient Ops
Efficient Ops
Jun 12, 2017 · Operations

Mastering DevOps in Complex Business Systems: Theory, Culture, Architecture & Case Studies

This article presents a comprehensive overview of a GOPS 2017 Shenzhen talk on DevOps theory and practice in complex business environments, covering the fundamentals of DevOps, cultural transformation, technical architecture, and real‑world case studies that illustrate automation, deployment pipelines, and value‑stream delivery.

Continuous Deliverydevopsinfrastructure
0 likes · 17 min read
Mastering DevOps in Complex Business Systems: Theory, Culture, Architecture & Case Studies
Efficient Ops
Efficient Ops
Jun 10, 2017 · Operations

What Google’s SRE Book Reveals About Modern Operations

This article introduces the Chinese translation of Google’s SRE book, shares behind‑the‑scenes stories of its creation, and distills key concepts such as the AAA model, Borg architecture, SLOs, toil reduction, and the cultural shift required for reliable large‑scale services.

GoogleSRESite Reliability Engineering
0 likes · 20 min read
What Google’s SRE Book Reveals About Modern Operations
Efficient Ops
Efficient Ops
Jun 6, 2017 · Operations

How SF Express Transformed Its Infrastructure: From Chaos to Automated DevOps

This article details SF Express's journey from a fragmented, manual infrastructure operation to a standardized, automated DevOps environment, covering organizational restructuring, open‑source adoption, change management, capacity forecasting, and the vision for a self‑service "WeiX" platform.

Standardizationdevopsinfrastructure
0 likes · 18 min read
How SF Express Transformed Its Infrastructure: From Chaos to Automated DevOps
Efficient Ops
Efficient Ops
May 31, 2017 · Operations

How a Veteran Ops Leader Transforms DevOps into Full‑Chain Automation

This article shares a veteran operations leader’s insights on DevOps fundamentals, the comprehensive ops knowledge system and career paths, the evolution of small‑business web architectures, and the step‑by‑step development of a full‑chain automation platform, emphasizing both technical and soft‑skill growth.

career developmentdevopsinfrastructure
0 likes · 17 min read
How a Veteran Ops Leader Transforms DevOps into Full‑Chain Automation
MaGe Linux Operations
MaGe Linux Operations
May 2, 2017 · Operations

What Is Zabbix? A Deep Dive into Its Features, Architecture, and Deployment

Zabbix is an open‑source, web‑based enterprise monitoring platform that tracks Windows/Linux hosts, network devices, and hardware/software metrics, provides alerting, visualizes data via a customizable PHP web UI, and comprises components such as server, agents, proxies, Java gateway, and API, with flexible templates, discovery, and storage options.

AlertingIT Operationsinfrastructure
0 likes · 6 min read
What Is Zabbix? A Deep Dive into Its Features, Architecture, and Deployment
21CTO
21CTO
Apr 30, 2017 · Backend Development

Essential Backend Infrastructure for Scalable Internet Services

This article outlines the critical backend components and services—such as API gateways, MVC/IOC/ORM frameworks, caching, databases, search engines, message queues, unified authentication, configuration management, service governance, scheduling, logging, and data processing pipelines—that together enable stable, high‑availability, and maintainable online applications.

API GatewayCachingMicroservices
0 likes · 29 min read
Essential Backend Infrastructure for Scalable Internet Services
Meituan Technology Team
Meituan Technology Team
Apr 7, 2017 · Information Security

Insights on Google Infrastructure Security Design

Google’s new security white paper reveals how its deeply integrated, principle‑driven architecture—spanning physical data‑center safeguards, mutual‑authenticated multi‑tenant services, pervasive encryption, and a comprehensive DevSecOps process—enables massive‑scale protection, but replicating this model demands substantial custom hardware, unified tooling, and large‑scale engineering expertise.

Data ProtectionGooglecloud security
0 likes · 22 min read
Insights on Google Infrastructure Security Design
Efficient Ops
Efficient Ops
Mar 21, 2017 · Operations

Rethinking Operations: The “Third Kind” of SRE at Lianjia

The article shares the author’s experience transitioning from private to public and hybrid clouds at Lianjia, introduces a “third kind” of operations that blends traditional and internet‑based practices, and discusses containers, DNS‑based naming, and automation tools to build adaptable, cost‑effective infrastructure.

Hybrid CloudNaming ServiceSRE
0 likes · 21 min read
Rethinking Operations: The “Third Kind” of SRE at Lianjia
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 15, 2017 · Operations

Alibaba IDC and Network Monitoring System Architecture and Practices

The article details Alibaba's globally distributed IDC and network monitoring systems, describing their fully distributed data collection, centralized computation, storage strategies, alarm mechanisms, and frontend visualization that together enable real‑time infrastructure and network health management for large‑scale operations.

IDCdistributed systemsinfrastructure
0 likes · 13 min read
Alibaba IDC and Network Monitoring System Architecture and Practices
DevOps
DevOps
Feb 23, 2017 · Cloud Native

JD's Migration from OpenStack to Kubernetes: Lessons and Architecture of JDOS 2.0

Since the end of 2016, JD has been transitioning its infrastructure from OpenStack to Kubernetes, completing 20% of the migration and aiming for full conversion by Q2, and shares detailed experiences, architectural evolution, operational practices, and future directions for large‑scale container platforms.

Cloud NativeContainer OrchestrationJDOS
0 likes · 16 min read
JD's Migration from OpenStack to Kubernetes: Lessons and Architecture of JDOS 2.0
Meituan Technology Team
Meituan Technology Team
Jan 23, 2017 · Cloud Native

Meituan-Dianping Docker Container Platform: Architecture and Practices

Meituan‑Dianping’s Docker Container Platform, built on a four‑layer architecture that integrates API orchestration, host‑side management, a hybrid image registry, OVS‑DPDK networking, LVM‑backed storage, and low‑overhead monitoring, enables seconds‑level scaling, live resource adjustments, and major cost savings across dozens of business units by combining containers with traditional VMs.

Cloud NativeDockercontainer platform
0 likes · 23 min read
Meituan-Dianping Docker Container Platform: Architecture and Practices
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 5, 2017 · Cloud Native

How Alibaba Unified T4 and Docker into AliDocker for Double‑11 Scale

This article details Alibaba's large‑scale migration of core transaction services from traditional VM and proprietary T4 containers to a unified Docker‑based platform called AliDocker, covering integration challenges, image‑based deployment, Swarm customizations, and middleware Dockerization that enabled seamless double‑11 operations.

AliDockerDockerSwarm
0 likes · 18 min read
How Alibaba Unified T4 and Docker into AliDocker for Double‑11 Scale
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 13, 2016 · Operations

How Alibaba Evolved Its Application Operations: From Scripts to DevOps

Alibaba’s application operations journey, detailed by researcher Lin Hao, traces the shift from early script‑based practices through tool‑centric phases to a full DevOps transformation, highlighting challenges, automation efforts, and the emerging push toward intelligent, data‑driven operations.

Alibabadevopsinfrastructure
0 likes · 19 min read
How Alibaba Evolved Its Application Operations: From Scripts to DevOps
Architects' Tech Alliance
Architects' Tech Alliance
Nov 16, 2016 · Cloud Computing

How OpenStack Ironic Enables Bare-Metal Provisioning in the Cloud

OpenStack Ironic is a dedicated bare‑metal service that replaces Nova’s original driver, using PXE and IPMI to automate physical server deployment, power management, and resource discovery, integrating with Keystone, Nova, Neutron, Glance, and Cinder to provide cloud‑like provisioning for real hardware.

Bare MetalCloud ComputingIronic
0 likes · 6 min read
How OpenStack Ironic Enables Bare-Metal Provisioning in the Cloud
Architects' Tech Alliance
Architects' Tech Alliance
Nov 4, 2016 · Big Data

The Seven Camps of the Global Big Data Ecosystem

The article outlines how mobile Internet merges the data‑driven society with the physical world to create a new big‑data architecture and describes the seven distinct camps—Infrastructure, Analytics, Applications, Cross‑Domain Architecture, Open‑Source, Data Sources & APIs, and Incubator & Training—that together form a comprehensive end‑to‑end big‑data solution ecosystem.

APIAnalyticsApplications
0 likes · 3 min read
The Seven Camps of the Global Big Data Ecosystem
Efficient Ops
Efficient Ops
Oct 31, 2016 · Operations

What Are DevOps’ Eight Honors and Shames? Insights from Heroku’s 12‑Factor Manifesto

This article presents a seasoned DevOps expert’s eight “honors and shames” principles, explains why configuration, redundancy, restartability, whole‑delivery, statelessness, standardization, automation, and unattended operation matter, and connects them to Heroku’s twelve‑factor app guidelines for building resilient cloud services.

devopsinfrastructure
0 likes · 21 min read
What Are DevOps’ Eight Honors and Shames? Insights from Heroku’s 12‑Factor Manifesto
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Oct 19, 2016 · Operations

Wonder Monitoring: Scaling Ops with Open‑Falcon‑Powered Automation

This article explains how the internally built Wonder monitoring system, based on Open‑Falcon, tackles large‑scale operational challenges by offering automated agent updates, customizable metrics, log and port monitoring, persistent alarm storage, enhanced alert content, and comprehensive dashboards for thousands of devices.

AlertingAutomationMonitoring
0 likes · 7 min read
Wonder Monitoring: Scaling Ops with Open‑Falcon‑Powered Automation
Qunar Tech Salon
Qunar Tech Salon
Oct 11, 2016 · Operations

Design and Implementation of Qunar Network Device Operations Platform

Facing growing network device counts and limited netops staff, Qunar built a network device operations platform that integrates command automation, permission-controlled tasks, monitoring, and dynamic scaling using Docker, Marathon, and Celery, thereby improving efficiency, reducing risk, and enabling comprehensive auditability.

Network OperationsPermission controlinfrastructure
0 likes · 8 min read
Design and Implementation of Qunar Network Device Operations Platform
Architects' Tech Alliance
Architects' Tech Alliance
Sep 30, 2016 · Cloud Computing

Understanding Hyper‑Converged Infrastructure: Nutanix Overview and Market Landscape

The article provides a comprehensive overview of converged and hyper‑converged infrastructure, discusses typical use cases such as VDI and database acceleration, compares major vendor solutions, and details Nutanix’s product lines, architecture, performance considerations, cloud integration, and micro‑service capabilities.

Hyper-ConvergedNutanixinfrastructure
0 likes · 9 min read
Understanding Hyper‑Converged Infrastructure: Nutanix Overview and Market Landscape
Efficient Ops
Efficient Ops
Sep 19, 2016 · Operations

How Ctrip Revolutionized IDC Management with Visual Automation

Ctrip’s rapid internet growth forced a massive data‑center expansion, prompting the company to evolve from self‑built facilities to hybrid vendor‑leased IDC, and ultimately to a visual management platform that automates monitoring, space planning, device intake, and operational workflows, dramatically improving efficiency and reducing manual effort.

CMDBinfrastructurevisualization
0 likes · 13 min read
How Ctrip Revolutionized IDC Management with Visual Automation
Qunar Tech Salon
Qunar Tech Salon
Sep 14, 2016 · Cloud Computing

Design and Implementation of Ctrip's Virtual Cloud Desktop System Based on OpenStack

This article presents Ctrip's deployment of a virtual cloud desktop system for its call center, detailing the OpenStack‑based architecture, advantages over traditional PCs, challenges encountered, the evolution to a decoupled design, resource over‑commit strategies, networking issues, and the operational tools and automated testing that ensure stability.

Cloud ComputingOpenStackinfrastructure
0 likes · 13 min read
Design and Implementation of Ctrip's Virtual Cloud Desktop System Based on OpenStack
Efficient Ops
Efficient Ops
Sep 5, 2016 · Operations

Inside Google’s Data Centers: How SRE Manages Hardware, Borg, and Global Services

This article explains how Google’s Site Reliability Engineering team designs and operates uniform hardware in its data centers, uses the Borg cluster manager, implements storage layers, SDN networking, monitoring, and a sample Shakespeare search service to achieve high‑availability, scalable production services.

BorgGoogle SREdistributed systems
0 likes · 21 min read
Inside Google’s Data Centers: How SRE Manages Hardware, Borg, and Global Services
High Availability Architecture
High Availability Architecture
Aug 30, 2016 · Operations

Evolution of Meizu Flyme Operations Architecture and High‑Availability Practices

The article details Meizu's Flyme operations platform evolution—from a single‑cabinet setup in 2011 to a multi‑IDC, 6000‑server infrastructure—highlighting challenges, architectural upgrades, monitoring, cost control, automation, and future high‑availability directions for large‑scale internet services.

Cost ControlHigh AvailabilityMonitoring
0 likes · 13 min read
Evolution of Meizu Flyme Operations Architecture and High‑Availability Practices
Ctrip Technology
Ctrip Technology
Aug 26, 2016 · Information Security

Automated Firewall Operations and Management System at Ctrip

The article describes how Ctrip’s network security team built an automated, centralized firewall management platform that handles multi‑brand firewalls, streamlines policy queries, generation, and deployment, integrates with change‑ticket workflows, and dramatically improves operational efficiency while reducing human error.

CtripOperationsfirewall automation
0 likes · 14 min read
Automated Firewall Operations and Management System at Ctrip
Efficient Ops
Efficient Ops
Aug 25, 2016 · Operations

How Tencent Scales Ops Automation for Hundreds of Thousands of Servers

This article explains how Tencent transformed massive operational pressure from billions of users and half‑million servers into an automated, standardized workflow by defining clear goals, building a layered CMDB, integrating Dev and Ops, and implementing a six‑step deployment pipeline that balances efficiency with safety.

CMDBOperations AutomationTencent
0 likes · 21 min read
How Tencent Scales Ops Automation for Hundreds of Thousands of Servers
21CTO
21CTO
Apr 20, 2016 · Operations

How Spotify Scaled Machine Management: From Ops Chaos to Cloud Automation

This article chronicles Spotify's evolution in server operations—from a manual Ops team and ad‑hoc tools in the early years, through automated DNS, provisioning, and self‑service platforms, to a hybrid cloud strategy that reduced resource‑request turnaround from weeks to minutes.

AutomationCloud MigrationOperations
0 likes · 14 min read
How Spotify Scaled Machine Management: From Ops Chaos to Cloud Automation
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 11, 2016 · Cloud Computing

Explore OpenStack's Core Services: From Nova to Ceilometer

This article introduces the key OpenStack services—Nova, Neutron, Keystone, Glance, Horizon, Cinder, Swift, Heat, and Ceilometer—explaining each component’s role, functionality, and how they collectively enable scalable compute, networking, identity, image, dashboard, block storage, object storage, orchestration, and telemetry in cloud environments.

Cloud ComputingOpenStackService Architecture
0 likes · 9 min read
Explore OpenStack's Core Services: From Nova to Ceilometer
Efficient Ops
Efficient Ops
Mar 21, 2016 · Operations

How to Build a High‑Performance Unified Monitoring & Alerting Platform

This article outlines a comprehensive design for a high‑performance, unified operations monitoring platform, detailing a six‑layer architecture, the roles of data collection (using Ganglia), data extraction, and alerting modules (with Centreon), and provides practical integration tips, deployment diagrams, and Q&A for large‑scale environments.

AlertingCentreonGanglia
0 likes · 24 min read
How to Build a High‑Performance Unified Monitoring & Alerting Platform
Architect
Architect
Mar 18, 2016 · Backend Development

Sogou Business Platform Infrastructure Evolution: From Horizontal Scaling to Stream Computing

This article outlines Sogou's infrastructure evolution under rapid business iteration, detailing stages of compute and storage horizontal scaling, serviceization, and stream computing, while sharing the practices, principles, lessons learned, and reflections that guided the platform's architectural transformation.

Service ArchitectureSogouinfrastructure
0 likes · 4 min read
Sogou Business Platform Infrastructure Evolution: From Horizontal Scaling to Stream Computing
21CTO
21CTO
Mar 11, 2016 · Operations

Scaling DevOps at Mogujie: How a Young Ops Team Tackled Massive Traffic and Double‑11

Facing explosive traffic and high‑concurrency demands, Mogujie's newly formed operations team adopted DevOps practices, built CMDB, CI/CD pipelines, and monitoring platforms, and successfully supported the massive Double‑11 and Double‑12 sales events, sharing key technologies and lessons learned in their rapid‑pace environment.

CMDBContinuous Integrationdevops
0 likes · 3 min read
Scaling DevOps at Mogujie: How a Young Ops Team Tackled Massive Traffic and Double‑11
Architecture Digest
Architecture Digest
Mar 5, 2016 · Operations

Dianping Operations Architecture Overview and Best Practices

This article presents a comprehensive overview of Dianping's operations architecture, detailing team organization, multi‑data‑center infrastructure, monitoring layers, automation tools, configuration management systems, incident analysis, lessons learned, and future directions such as Docker and PaaS adoption.

AutomationDockerMonitoring
0 likes · 16 min read
Dianping Operations Architecture Overview and Best Practices
Efficient Ops
Efficient Ops
Feb 24, 2016 · Operations

Is Operations Automation Overhyped? A Pragmatic Look at Real‑World Practices

The article critiques the hype around operations automation, arguing that many tasks can be handled with simple shell scripts, that automation should solve error‑prone manual work rather than replace thoughtful architecture, and that choosing the most convenient tool is more valuable than chasing trendy solutions.

AutomationOperationsShell Scripting
0 likes · 13 min read
Is Operations Automation Overhyped? A Pragmatic Look at Real‑World Practices
21CTO
21CTO
Jan 25, 2016 · Cloud Native

Why Docker Still Dominates: 2016 Tech Awards Highlights & Key Container Projects

InfoWorld’s 2016 Technology of the Year Awards spotlight Docker’s dominance, listing top container‑related projects such as Docker, Kubernetes, CoreOS, Mesos and others, while also covering a broad range of languages, tools, cloud services and big‑data platforms that shaped the tech landscape.

Cloud NativeContainersDocker
0 likes · 6 min read
Why Docker Still Dominates: 2016 Tech Awards Highlights & Key Container Projects
Efficient Ops
Efficient Ops
Dec 14, 2015 · Operations

Top Ops Security Pitfalls and How to Safeguard Your Infrastructure

This article examines the most common operational security vulnerabilities—such as unpatched Struts, server‑status leaks, backup file exposure, SVN leaks, and weak default credentials—explains why they are critical, and offers practical recommendations for enterprises to improve their ops‑security posture.

Patch managementVulnerability Managementinfrastructure
0 likes · 15 min read
Top Ops Security Pitfalls and How to Safeguard Your Infrastructure
Qunar Tech Salon
Qunar Tech Salon
Dec 14, 2015 · Cloud Native

Building Scalable Development Environments with Docker, Mesos, and Kubernetes: Lessons Learned

This article details a year‑long journey of designing, deploying, and operating container‑based development environments using Docker, Apache Mesos, and Kubernetes, covering the challenges of version consistency, rapid environment switching, resource isolation, and the practical solutions and lessons gathered from real‑world production use.

Container OrchestrationDockerMesos
0 likes · 16 min read
Building Scalable Development Environments with Docker, Mesos, and Kubernetes: Lessons Learned
Architect
Architect
Nov 25, 2015 · Cloud Native

Kubernetes Architecture Overview and Practical Insights

This article introduces Kubernetes, explains why it is used, outlines its core goals, describes the main components and their functions, discusses the architectural improvements it enables, and shares practical deployment experiences and common issues encountered during real‑world usage.

Container Orchestrationdevopsinfrastructure
0 likes · 15 min read
Kubernetes Architecture Overview and Practical Insights
Efficient Ops
Efficient Ops
Sep 20, 2015 · Operations

From Internet Ops to Banking: Lessons on Data Center Challenges

In a candid Q&A, industry veterans discuss the fundamental differences between internet and traditional banking operations, share experiences transitioning between sectors, and outline strategies to eliminate difficult data‑center maintenance, highlighting risk‑focused versus growth‑driven approaches.

Cloud ComputingData CenterIT ops
0 likes · 14 min read
From Internet Ops to Banking: Lessons on Data Center Challenges
Architects' Tech Alliance
Architects' Tech Alliance
Sep 13, 2015 · Cloud Computing

Infrastructure Convergence: Hardware Fusion and Hyper‑Converged Systems Overview

The article explains the evolution of enterprise IT infrastructure toward both custom, small‑scale distributed designs driven by cloud computing and integrated fusion/hyper‑converged architectures, detailing their design principles, differences, major vendor solutions, and the role of software‑defined storage.

Hyper-ConvergedSoftware-Defined Storageinfrastructure
0 likes · 12 min read
Infrastructure Convergence: Hardware Fusion and Hyper‑Converged Systems Overview
Efficient Ops
Efficient Ops
Aug 3, 2015 · Cloud Computing

How 1hao Store Uses Hybrid Cloud to Balance Cost and Performance

This article explains how an e‑commerce platform leverages a hybrid cloud architecture to handle massive traffic spikes from marketing events while optimizing costs, and outlines six key considerations for successful implementation.

Cloud ComputingE‑commerceOperations
0 likes · 10 min read
How 1hao Store Uses Hybrid Cloud to Balance Cost and Performance
Efficient Ops
Efficient Ops
Jul 27, 2015 · Operations

What Google SREs Do: Inside the Role that Powers Reliable Services

This article explains the responsibilities, requirements, and daily work of Google Site Reliability Engineers, contrasts them with Software Engineers, outlines key internal infrastructure components, and discusses the future direction of operations engineering in the cloud era.

GoogleOperationsReliability Engineering
0 likes · 11 min read
What Google SREs Do: Inside the Role that Powers Reliable Services
Efficient Ops
Efficient Ops
Jul 23, 2015 · Operations

How Project Scorpio Reshaped China’s Data Center Rack Standards

This article chronicles the birth and evolution of China’s Project Scorpio—from its 2011 launch through Scorpio 1.0 and 2.0 specifications—highlighting its collaboration with Intel, its technical trade‑offs with Open Rack, and its impact on data‑center operations and standards.

Data CenterOpen Compute ProjectOperations
0 likes · 17 min read
How Project Scorpio Reshaped China’s Data Center Rack Standards
MaGe Linux Operations
MaGe Linux Operations
Jun 16, 2015 · Operations

Inside Dianping’s Ops: Building Scalable Monitoring, Automation, and Self‑Service Platforms

This article details how Dianping’s sub‑40‑person operations team structures its groups, designs a dual‑datacenter architecture, and creates comprehensive monitoring, automation, configuration, and analysis systems—including Zabbix, Cat, workflow, Button, and a custom radar platform—to achieve high‑availability, self‑service, and continuous improvement.

AutomationMonitoringOps
0 likes · 18 min read
Inside Dianping’s Ops: Building Scalable Monitoring, Automation, and Self‑Service Platforms

Stack Overflow Architecture and Operations: Scaling, Performance, and Infrastructure Overview

This article provides a comprehensive overview of Stack Overflow's infrastructure, detailing its vertically‑scaled hardware, use of Microsoft and Linux technologies, high‑availability design, caching layers, database strategies, deployment processes, monitoring, and the performance‑first philosophy that drives its efficient operation.

infrastructureperformancescaling
0 likes · 17 min read
Stack Overflow Architecture and Operations: Scaling, Performance, and Infrastructure Overview
Efficient Ops
Efficient Ops
May 27, 2015 · Operations

How NoOps Transforms Operations: Automating Service Management

The article outlines the NoOps philosophy of automating routine operational tasks, describes how a tech‑learning team builds self‑service platforms, leverages open‑source tools, and invests in research to boost efficiency, stability, and innovation in modern internet services.

NoOpsPlatform Engineeringinfrastructure
0 likes · 11 min read
How NoOps Transforms Operations: Automating Service Management
Efficient Ops
Efficient Ops
May 22, 2015 · Operations

Mastering Puppet: From Basics to Advanced Ops Automation and Docker Integration

This article summarizes a comprehensive talk on Puppet covering its evolution, core concepts, architecture, ecosystem, practical use cases such as building a CMDB, automated deployment pipelines, OpenStack deployment, and the interplay with Docker, highlighting how Puppet drives modern operations automation.

OperationsPuppetconfiguration management
0 likes · 13 min read
Mastering Puppet: From Basics to Advanced Ops Automation and Docker Integration
MaGe Linux Operations
MaGe Linux Operations
Mar 26, 2015 · Operations

Essential Open‑Source Tools for Backup, Cloud, DevOps, and IT Operations

This article compiles a comprehensive list of open‑source tools covering backup, cloning, cloud platforms, cloud workflows, distributed file systems, cloud storage, code review, collaboration suites, CMDB, configuration management, continuous integration/deployment, DNS, hosting control panels, IT asset management, and LDAP, providing a valuable resource for IT professionals.

CloudOperationsbackup
0 likes · 11 min read
Essential Open‑Source Tools for Backup, Cloud, DevOps, and IT Operations
MaGe Linux Operations
MaGe Linux Operations
Sep 13, 2014 · Operations

How to Build a Scalable Small Website: From Thousands to Millions of Daily Visits

This article systematically outlines the essential steps and considerations—ranging from language choice, version control, hardware and data center selection, to architecture, software, database, storage, and code optimization—to help a small website scale from a few thousand daily visits to millions while avoiding costly pitfalls.

Operationsbackenddevops
0 likes · 13 min read
How to Build a Scalable Small Website: From Thousands to Millions of Daily Visits