Tagged articles
3281 articles
Page 32 of 33
21CTO
21CTO
Apr 5, 2016 · Operations

How Tencent’s AMS Achieved Fault Tolerance at Billion‑Request Scale

This article shares Tencent’s experience building fault‑tolerant mechanisms for the AMS activity platform, covering retry strategies, automatic machine exclusion, timeout tuning, service isolation, asynchronous processing, anti‑replay safeguards, and operational best practices that transformed a million‑request service into an 800‑million‑request system.

OperationsRetrySystem Design
0 likes · 24 min read
How Tencent’s AMS Achieved Fault Tolerance at Billion‑Request Scale
Efficient Ops
Efficient Ops
Mar 29, 2016 · Operations

From the Stirrup to AI: How Automation Transforms Operations

At the GOPS2016 conference, speaker Cui Xiaochun likens the invention of the horse stirrup to modern automation, tracing the evolution of operations from manual scripts to AI-driven intelligent systems, and argues that embracing AI is the next revolutionary step for ops teams.

AIAutomationHistorical analogy
0 likes · 7 min read
From the Stirrup to AI: How Automation Transforms Operations
21CTO
21CTO
Mar 22, 2016 · Operations

Build a Scalable Unified Monitoring & Alert Platform with Ganglia & Centreon

This article explains how to design and implement a unified operations monitoring and alerting platform by combining Ganglia for data collection with Centreon for alerting, covering architecture layers, module functions, integration steps, and practical Q&A for large‑scale deployments.

AlertingAutomationCentreon
0 likes · 20 min read
Build a Scalable Unified Monitoring & Alert Platform with Ganglia & Centreon
21CTO
21CTO
Mar 22, 2016 · Operations

Inside Facebook’s ‘Hotfix Bar’: Secrets of Massive Deployments

During an exclusive visit to Facebook’s Menlo Park campus, the author uncovers the company’s sophisticated release engineering practices—including the HipHop optimizer, a custom BitTorrent‑based deployment system, continuous testing, and a unique “Hotfix Bar” culture—revealing how billions of daily requests are reliably delivered at massive scale.

DeploymentFacebookOperations
0 likes · 18 min read
Inside Facebook’s ‘Hotfix Bar’: Secrets of Massive Deployments
Efficient Ops
Efficient Ops
Mar 21, 2016 · Operations

How to Build a High‑Performance Unified Monitoring & Alerting Platform

This article outlines a comprehensive design for a high‑performance, unified operations monitoring platform, detailing a six‑layer architecture, the roles of data collection (using Ganglia), data extraction, and alerting modules (with Centreon), and provides practical integration tips, deployment diagrams, and Q&A for large‑scale environments.

AlertingCentreonGanglia
0 likes · 24 min read
How to Build a High‑Performance Unified Monitoring & Alerting Platform
21CTO
21CTO
Mar 17, 2016 · Backend Development

Turn Java Enterprise Performance Tuning into a Scientific Process

This article explains a systematic, waiting‑point‑based approach to enterprise Java performance tuning, covering load‑test design, analysis of existing versus new applications, hierarchical and technical waiting points, pool and cache sizing, and a back‑tuning workflow to achieve measurable improvements.

JavaLoad TestingOperations
0 likes · 17 min read
Turn Java Enterprise Performance Tuning into a Scientific Process
21CTO
21CTO
Mar 17, 2016 · Operations

How Vipshop’s Three‑Tier Monitoring System Keeps Services Running Smoothly

This article explains Vipshop’s multi‑layer monitoring architecture, detailing system‑level metrics, application‑level tracing with the Mercury platform, and business‑level KPI dashboards, while describing the data pipelines that collect, process, and alert on distributed logs to ensure reliable operations.

Distributed SystemsOperationsVipshop
0 likes · 4 min read
How Vipshop’s Three‑Tier Monitoring System Keeps Services Running Smoothly
Baidu Intelligent Testing
Baidu Intelligent Testing
Mar 15, 2016 · Operations

Establishing an Operations Evaluation Model: Steps, Metrics, and Key Considerations

This article explains how to build an operations evaluation model within a quality competitiveness framework, detailing a three‑step process for defining metrics, evaluation methods, and quantification, and highlighting essential evaluation points, attention areas, and data collection practices for product operations.

MetricsOperationsevaluation model
0 likes · 8 min read
Establishing an Operations Evaluation Model: Steps, Metrics, and Key Considerations
Java High-Performance Architecture
Java High-Performance Architecture
Mar 15, 2016 · Operations

Building a 3-Dimensional Automated Visual Monitoring System for High-Availability

The article describes a three-dimensional, automated, visual monitoring approach for high-availability systems, detailing a five-layer monitoring model, automated log collection using Logstash-Redis-Elasticsearch, and visualization techniques that together reduce fault-locating time and improve operational efficiency.

AutomationOperationsSystem Design
0 likes · 5 min read
Building a 3-Dimensional Automated Visual Monitoring System for High-Availability
Architecture Digest
Architecture Digest
Mar 12, 2016 · Operations

Stack Overflow Architecture Overview: Hardware, Scaling, and Infrastructure (2015)

This article provides a detailed overview of Stack Overflow's 2015 architecture, covering daily traffic growth, hardware upgrades, redundancy principles, DNS and ISP routing, HAProxy load balancing, IIS/ASP.NET web layer, Redis caching, WebSocket services, Elasticsearch search, SQL Server databases, and the open‑source tools that support the platform.

OperationsSQL ServerScalability
0 likes · 17 min read
Stack Overflow Architecture Overview: Hardware, Scaling, and Infrastructure (2015)
Java High-Performance Architecture
Java High-Performance Architecture
Mar 11, 2016 · Operations

Ensuring High Availability: Functional Separation and Degradation Strategies

The article explains how functional separation and degradation techniques—distinguishing core from non‑core services, isolating them logically and physically, and implementing manual or automatic fallback mechanisms—help maintain high availability in distributed systems during traffic spikes or component failures.

OperationsSystem Designdegradation
0 likes · 3 min read
Ensuring High Availability: Functional Separation and Degradation Strategies
21CTO
21CTO
Mar 8, 2016 · R&D Management

Surviving Startup Chaos: Key Strategies for Project, Code, and Team Management

This article examines the common pitfalls faced by engineers in fast‑growing startups—from poor project planning and rushed code refactoring to unclear product requirements, weak organizational processes, hasty technology choices, operations overload, and people‑related challenges—offering practical guidance to navigate each issue.

OperationsProduct Developmentstartup
0 likes · 14 min read
Surviving Startup Chaos: Key Strategies for Project, Code, and Team Management
21CTO
21CTO
Mar 6, 2016 · Operations

Inside Stack Overflow’s 2016 Architecture: Handling 61 Million Daily Requests

The article details Stack Overflow’s 2016 infrastructure upgrades—including hardware, networking, load balancing, caching, database, and service layers—that enabled the site to process over 61 million daily requests while reducing processing time by hundreds of hours.

Operationsarchitecturecaching
0 likes · 12 min read
Inside Stack Overflow’s 2016 Architecture: Handling 61 Million Daily Requests
Architecture Digest
Architecture Digest
Mar 5, 2016 · Operations

Dianping Operations Architecture Overview and Best Practices

This article presents a comprehensive overview of Dianping's operations architecture, detailing team organization, multi‑data‑center infrastructure, monitoring layers, automation tools, configuration management systems, incident analysis, lessons learned, and future directions such as Docker and PaaS adoption.

AutomationDevOpsDocker
0 likes · 16 min read
Dianping Operations Architecture Overview and Best Practices
21CTO
21CTO
Mar 5, 2016 · Backend Development

How to Choose, Use, and Extend Open‑Source Projects Without Reinventing the Wheel

This article explores the DRY principle in software development, explains why many open‑source projects violate it, and provides practical guidance on selecting, using, and customizing open‑source solutions through real‑world case studies, focusing on business fit, maturity, operational capability, and safe integration.

OperationsSoftware Engineeringbest practices
0 likes · 12 min read
How to Choose, Use, and Extend Open‑Source Projects Without Reinventing the Wheel
Qunar Tech Salon
Qunar Tech Salon
Mar 5, 2016 · Operations

Common Linux Commands for Java Developers

This article provides Java developers with a concise reference of essential Linux shell commands, covering process inspection, file manipulation, permission changes, compression, networking checks, remote access, and other common operations needed for interacting with Linux servers during development and deployment.

CommandLineDevOpsLinux
0 likes · 7 min read
Common Linux Commands for Java Developers
dbaplus Community
dbaplus Community
Mar 3, 2016 · Operations

Why Every Developer Must Master Core Ops Skills

The article explains why developers need to understand operations—covering resource usage, fault handling, platform basics, and essential ops tools—so they can write maintainable code, avoid common pitfalls, and collaborate effectively with ops teams for reliable, high‑performance services.

OperationsSoftware Engineeringcoding standards
0 likes · 14 min read
Why Every Developer Must Master Core Ops Skills
DevOps
DevOps
Mar 2, 2016 · Operations

Understanding DevOps: Principles, Practices, and Implementation

This article provides a comprehensive overview of DevOps, explaining its purpose, cultural challenges, core principles such as automation, standardization, and configuration, its relationship with cloud, lean and agile, practical steps, metrics, and how it transforms IT delivery into an end‑to‑end business value pipeline.

AutomationContinuous DeliveryDevOps
0 likes · 17 min read
Understanding DevOps: Principles, Practices, and Implementation
Efficient Ops
Efficient Ops
Feb 24, 2016 · Operations

Is Operations Automation Overhyped? A Pragmatic Look at Real‑World Practices

The article critiques the hype around operations automation, arguing that many tasks can be handled with simple shell scripts, that automation should solve error‑prone manual work rather than replace thoughtful architecture, and that choosing the most convenient tool is more valuable than chasing trendy solutions.

AutomationInfrastructureOperations
0 likes · 13 min read
Is Operations Automation Overhyped? A Pragmatic Look at Real‑World Practices
Architecture Digest
Architecture Digest
Feb 24, 2016 · Backend Development

Lessons from 14 Years of Website Architecture Evolution

Drawing on fourteen years of hands‑on experience, the article chronicles how a website’s architecture matures from a simple personal homepage to a billion‑page‑view enterprise system, highlighting the essential principles, design patterns, operational practices, and scalability strategies that underpin successful large‑scale web platforms.

Backend DevelopmentOperationsPerformance Optimization
0 likes · 30 min read
Lessons from 14 Years of Website Architecture Evolution
ITPUB
ITPUB
Feb 18, 2016 · Operations

Building a Custom RPC Stress‑Testing Tool: Insights from Meituan

Meituan’s internal RPC services, largely built on Thrift, required a streamlined pressure‑testing solution, leading to the development of a custom tool that automates traffic capture, provides an intuitive UI, aggregates metrics via InfluxDB, and supports both Thrift and HTTP workloads, addressing the shortcomings of existing open‑source options.

Backend ToolsOperationsRPC
0 likes · 8 min read
Building a Custom RPC Stress‑Testing Tool: Insights from Meituan
Architects' Tech Alliance
Architects' Tech Alliance
Feb 17, 2016 · Cloud Computing

Overview of Hyper-V Features, Management, and Storage Capabilities

This article provides a comprehensive overview of Hyper-V, covering its extensive operating system support, virtual networking, management integration with System Center, dynamic memory, storage options, VM conversion tools, and key SMB 3.0 features for high‑availability and performance in virtualized environments.

Hyper-VOperationsSMB
0 likes · 9 min read
Overview of Hyper-V Features, Management, and Storage Capabilities
Architecture Digest
Architecture Digest
Feb 17, 2016 · Backend Development

Evolution of VIP (Vipshop) Business Model and System Architecture

The article outlines VIP's transition from a simple outlet‑style e‑commerce platform to a multi‑brand flash‑sale service, detailing each architectural phase—from a monolithic LAMP stack through vertical silo and distributed service‑oriented designs—to a cloud‑native, platform‑plus‑application model that supports scalable, high‑availability operations.

Backend DevelopmentOperationsVipshop
0 likes · 11 min read
Evolution of VIP (Vipshop) Business Model and System Architecture
Efficient Ops
Efficient Ops
Feb 15, 2016 · Operations

Can Operations Survive the Cloud Revolution? Strategies for the Next Decade

As cloud computing reshapes IT, traditional operations roles face unprecedented disruption, but by embracing cloud‑focused responsibilities, niche industry needs, or even a complete career pivot, ops professionals can secure their future within the next five to ten years.

Career DevelopmentIT infrastructureOperations
0 likes · 9 min read
Can Operations Survive the Cloud Revolution? Strategies for the Next Decade
Efficient Ops
Efficient Ops
Feb 3, 2016 · Operations

Why Human Errors Still Plague Modern Ops and How to Prevent Them

This article examines recent high‑profile internet outages caused by human error, explores why operations teams are especially prone to mistakes despite automation and standards, and offers practical strategies—such as hiring the right people, fostering safety awareness, and turning professionalism into habit—to reduce future incidents.

AutomationOperationsbest practices
0 likes · 14 min read
Why Human Errors Still Plague Modern Ops and How to Prevent Them
Efficient Ops
Efficient Ops
Feb 3, 2016 · Operations

Putting People First: Building a Human‑Centred Efficient Operations System

This article explores how a people‑centric mindset can transform operations by defining a three‑layer framework, clarifying why human factors matter, and offering concrete process, technology, and organizational practices such as streamlined approval flows, voice‑alert systems, and Docker‑based continuous deployment.

AutomationOperationsService Management
0 likes · 12 min read
Putting People First: Building a Human‑Centred Efficient Operations System
Efficient Ops
Efficient Ops
Feb 2, 2016 · Operations

How Ops Professionals Can Boost Happiness and Efficiency: 4 Common Pitfalls and Practical Solutions

This article examines why many operations engineers feel unhappy, identifies four personal‑management problems—over‑pursuing tech, mis‑prioritizing tasks, poor communication, and chronic complaining—and offers concrete, actionable suggestions to improve productivity, satisfaction, and team collaboration.

OperationsPersonal Developmentcommunication
0 likes · 16 min read
How Ops Professionals Can Boost Happiness and Efficiency: 4 Common Pitfalls and Practical Solutions
Efficient Ops
Efficient Ops
Feb 2, 2016 · Operations

Unlocking Efficient Operations: 7 Secrets to Happy SysAdmins

This article explores why efficient operations are hard to achieve, identifies common pitfalls such as unclear responsibilities, communication gaps, and resource mismatches, and presents a practical framework—including clear roles, professional processes, and a good service interface—to help operations teams become more effective and satisfied.

AutomationOperationscommunication
0 likes · 16 min read
Unlocking Efficient Operations: 7 Secrets to Happy SysAdmins
Efficient Ops
Efficient Ops
Feb 2, 2016 · Operations

Operations 2.0: The Final Opportunity to Transform IT Ops in the Cloud Era

The article argues that traditional IT operations are facing a crisis and proposes Operations 2.0—a service‑oriented, business‑aware model that leverages cloud, open‑source and automation to shift focus from technical output to reliable, value‑adding services, outlining why it is essential and how to implement it.

AutomationIT transformationOperations
0 likes · 14 min read
Operations 2.0: The Final Opportunity to Transform IT Ops in the Cloud Era
21CTO
21CTO
Jan 28, 2016 · Operations

How to Build High‑Availability Systems: Lessons from a Transaction Platform Evolution

This article shares practical insights on achieving high availability by understanding goals, decomposing requirements, designing resilient architectures, ensuring operability, testing rigorously, and reducing release risk, illustrated through the multi‑stage evolution of a transaction system.

MicroservicesOperationsScalability
0 likes · 14 min read
How to Build High‑Availability Systems: Lessons from a Transaction Platform Evolution
Architect
Architect
Jan 26, 2016 · Operations

Evolution of Image Server Architecture: From Single‑Node to Distributed File System and CDN

The article examines how large‑scale web sites handle massive image resources, tracing the progression from simple single‑machine storage to clustered virtual directories, shared UNC storage, and finally a FastDFS‑based distributed file system combined with CDN acceleration, highlighting the architectural trade‑offs and operational considerations.

CDNFastDFSOperations
0 likes · 13 min read
Evolution of Image Server Architecture: From Single‑Node to Distributed File System and CDN
Efficient Ops
Efficient Ops
Jan 25, 2016 · Operations

Why You Still Need a Dedicated Deployment System Beyond Jenkins

While Jenkins offers powerful deployment plugins, this article explains why a standalone deployment system remains essential for continuous delivery, covering decoupling builds, managing complex environments, supporting varied deployment strategies, enforcing standards, gathering operational data, and enabling service-oriented deployment across teams.

JenkinsOperationsci/cd
0 likes · 9 min read
Why You Still Need a Dedicated Deployment System Beyond Jenkins
Node Underground
Node Underground
Jan 19, 2016 · Operations

Why Front‑End Developers Should Care About Docker: A Beginner’s Guide

This article explains how Docker’s build‑ship‑run model bridges front‑end development and containerization, covering Docker’s history, core concepts, a sample Dockerfile for a Node.js app, and practical scenarios where Docker improves environment consistency, resource efficiency, and scalability.

DevOpsDockerOperations
0 likes · 11 min read
Why Front‑End Developers Should Care About Docker: A Beginner’s Guide
Efficient Ops
Efficient Ops
Jan 18, 2016 · Operations

How Tencent Migrated 200M QQ Users After a Tianjin Explosion

When a massive container explosion threatened Tencent's Tianjin data center, the operations team executed a 24‑hour, continent‑wide user migration that moved over 200 million QQ users to Shenzhen and Shanghai without service interruption, showcasing unprecedented disaster‑recovery capabilities.

OperationsTencentdisaster recovery
0 likes · 10 min read
How Tencent Migrated 200M QQ Users After a Tianjin Explosion
21CTO
21CTO
Jan 18, 2016 · Operations

Why Immutable Infrastructure Is the Future of Reliable Deployments

Immutable Infrastructure treats every server or container as a read‑only unit that is replaced rather than modified, offering repeatable configuration, faster CI/CD, easier rollback, and reduced operational complexity, while requiring stateless applications and automated provisioning templates to succeed.

AutomationDeploymentOperations
0 likes · 9 min read
Why Immutable Infrastructure Is the Future of Reliable Deployments
Qunar Tech Salon
Qunar Tech Salon
Jan 16, 2016 · Backend Development

From Zero to One: The Evolution of WeChat’s Backend System Architecture

This article chronicles the two‑month development of WeChat’s backend from its inception, detailing the design of its message model, data‑sync protocol, three‑tier architecture, asynchronous queues, rapid scaling, platformization, multi‑data‑center deployment, disaster‑recovery strategies, performance optimizations, security hardening, and emerging resource‑scheduling challenges.

Distributed SystemsOperationsWeChat
0 likes · 28 min read
From Zero to One: The Evolution of WeChat’s Backend System Architecture
Efficient Ops
Efficient Ops
Jan 13, 2016 · Operations

Incremental vs Full Deployment: Which Strategy Wins for Modern Ops?

The article examines the trade‑offs between incremental and full deployment, outlining their workflows, advantages, and challenges, and concludes that full deployment is generally preferable for stateless units while incremental methods remain useful for stateful components like databases.

DeploymentOperationsfull deployment
0 likes · 9 min read
Incremental vs Full Deployment: Which Strategy Wins for Modern Ops?
Efficient Ops
Efficient Ops
Jan 6, 2016 · Operations

How Natural Cooling Can Cut Data Center Energy Costs by Over 20%

This article explains China's green data‑center policies, the importance of PUE, and demonstrates through calculations and real‑world Dalian case studies how natural cooling can halve cooling energy use, lower PUE from 2.5 to 2.0, and save millions in electricity bills.

Data centerOperationsPUE
0 likes · 11 min read
How Natural Cooling Can Cut Data Center Energy Costs by Over 20%
21CTO
21CTO
Jan 6, 2016 · Backend Development

Essential Best Practices for Accurate HTTP Load Testing

This article outlines ten practical guidelines—ranging from test environment consistency and dedicated hardware to network capacity checks, OS tuning, realistic workloads, proper test duration, and comprehensive result reporting—to ensure reliable and reproducible HTTP server performance benchmarks.

BackendBenchmarkLoad Testing
0 likes · 13 min read
Essential Best Practices for Accurate HTTP Load Testing
Java High-Performance Architecture
Java High-Performance Architecture
Jan 5, 2016 · Operations

How Service Degradation Keeps E‑commerce Platforms Stable During Traffic Surges

The article explains why service degradation is essential for large‑scale shopping events, outlines its different dimensions such as page, business module, and remote service downgrade, and describes both manual and automatic implementation methods to maintain system availability under heavy load.

Operationse‑commerceservice degradation
0 likes · 3 min read
How Service Degradation Keeps E‑commerce Platforms Stable During Traffic Surges
Efficient Ops
Efficient Ops
Dec 30, 2015 · Operations

E‑Commerce vs. General Internet Ops: Veteran Insights on Key Differences

A seasoned operations leader discusses how e‑commerce operational support differs from general internet applications, covering longer support chains, consistency models, seasonal traffic spikes, team role separation, mobile‑internet challenges, future planning, and the rise of enterprise‑level ops services.

Operationse‑commercemobile operations
0 likes · 14 min read
E‑Commerce vs. General Internet Ops: Veteran Insights on Key Differences
Efficient Ops
Efficient Ops
Dec 28, 2015 · Operations

Why Jumpserver Became the Go-To Open‑Source Bastion Host for Ops Teams

This article explains the origins, core features, design principles, and deployment resources of Jumpserver, an open‑source Python‑based bastion host that simplifies batch account management, command auditing, and web‑based terminal access for operation engineers.

Bastion HostJumpServerOperations
0 likes · 6 min read
Why Jumpserver Became the Go-To Open‑Source Bastion Host for Ops Teams
21CTO
21CTO
Dec 18, 2015 · Operations

How JD’s Order Fulfillment Center Scales to Millions of Orders During Mega‑Sales

This article explains how JD.com’s Order Fulfillment Center (OFC) was built, re‑engineered, and continuously optimized to handle massive order volumes during major sales events, covering its architecture, migration from .Net to Java, distributed task queues, flow control, and operational practices that ensure reliability and scalability.

OperationsScalabilitye‑commerce
0 likes · 24 min read
How JD’s Order Fulfillment Center Scales to Millions of Orders During Mega‑Sales
Efficient Ops
Efficient Ops
Dec 17, 2015 · Operations

Tackling QQ’s Legacy Ops: Automation, Capacity Management & Fault Analysis

This article shares Tencent’s QQ operations team insights on handling legacy issues, standardizing package and configuration management, leveraging the ZhiYun automation platform, and applying capacity management and fault‑root analysis techniques to boost efficiency and reduce costs.

AutomationOperationscapacity-management
0 likes · 10 min read
Tackling QQ’s Legacy Ops: Automation, Capacity Management & Fault Analysis
Efficient Ops
Efficient Ops
Dec 15, 2015 · Operations

Ops Experts Share Insights on Private Cloud, Career Shifts, and Enterprise IT Future

In this interview, seasoned ops veteran Zhijin discusses the similarities and differences between traditional and internet operations, the challenges of building private clouds in finance, advice for ops professionals considering entrepreneurship or job changes, and predicts a future where private, public, and industry clouds coexist.

Career DevelopmentIT OperationsOperations
0 likes · 13 min read
Ops Experts Share Insights on Private Cloud, Career Shifts, and Enterprise IT Future
Efficient Ops
Efficient Ops
Dec 6, 2015 · Operations

How Six Pillars Transform Data Center Operations into Full Automation

This article summarizes a seasoned operations expert’s insights on data‑center management, covering the evolution from ad‑hoc automation to a closed‑loop CMDB‑driven system, the six key capabilities for future data centers, and practical definitions of operations, automation, and DevOps.

AutomationData centerITIL
0 likes · 10 min read
How Six Pillars Transform Data Center Operations into Full Automation
21CTO
21CTO
Dec 5, 2015 · Backend Development

How LAMP Powers Rapid Iteration Across Development, Testing, and Operations

This article explains how a unified LAMP-based solution enables fast development, automated testing, and streamlined operations for large‑scale online services, detailing the architecture’s layers, tooling, and future directions for standardization and platformization.

Backend DevelopmentLAMPOperations
0 likes · 8 min read
How LAMP Powers Rapid Iteration Across Development, Testing, and Operations
MaGe Linux Operations
MaGe Linux Operations
Dec 3, 2015 · Operations

How Email Works: From DNS MX Records to Secure Delivery

This guide explains how email systems work—from DNS MX record lookup and server roles like MUA, MTA, MDA, and MRA to the detailed steps of sending, receiving, authentication, encryption, and spam filtering—providing operations engineers with a comprehensive understanding of mail infrastructure.

DNSEmailOperations
0 likes · 18 min read
How Email Works: From DNS MX Records to Secure Delivery
DevOps
DevOps
Dec 3, 2015 · Operations

The Pitfalls of DevOps Hype and the Full‑Stack Developer Expectation

The article critiques the growing DevOps and full‑stack developer hype, arguing that forcing engineers to juggle development, operations, QA, and DBA tasks devalues specialized work, creates unrealistic expectations, and ultimately harms both productivity and software quality.

DevOpsOperationsRole Overload
0 likes · 8 min read
The Pitfalls of DevOps Hype and the Full‑Stack Developer Expectation
DevOps
DevOps
Dec 2, 2015 · Operations

The Pitfalls of DevOps and Full‑Stack Expectations in Start‑ups

The article argues that the growing DevOps culture and the demand for “full‑stack” developers force engineers, especially in startups, to juggle multiple specialized roles—development, QA, operations, DBA—leading to inefficiency, burnout, and a dilution of true software craftsmanship.

DevOpsOperationsRoles
0 likes · 8 min read
The Pitfalls of DevOps and Full‑Stack Expectations in Start‑ups
Java High-Performance Architecture
Java High-Performance Architecture
Dec 1, 2015 · Operations

What Is Nagios? Key Features, Components, and Limitations Explained

Nagios is an enterprise‑grade, open‑source monitoring framework that tracks server, service, and network metrics such as CPU usage, memory, disk space, and network throughput, alerts via email or SMS on anomalies, and consists of a core, plugins, and extensions, though it lacks built‑in reporting and has configuration limitations.

IT infrastructureNagiosOperations
0 likes · 3 min read
What Is Nagios? Key Features, Components, and Limitations Explained
21CTO
21CTO
Nov 17, 2015 · Operations

How JD Scaled Its Order Fulfillment Center to Handle Millions of Orders

JD’s Order Fulfillment Center (OFC) evolved from a small data‑transfer team into a highly scalable, distributed architecture that handles massive order volumes during events like 618 and Double‑11, employing Java migration, service decomposition, flow control, and robust operations to ensure data consistency and rapid delivery.

Distributed SystemsJava migrationOperations
0 likes · 23 min read
How JD Scaled Its Order Fulfillment Center to Handle Millions of Orders
Efficient Ops
Efficient Ops
Nov 15, 2015 · Fundamentals

Understanding SSD Basics: Principles, Architecture, Risks, and Maintenance

This article explains SSD fundamentals, including flash memory principles, device composition, controller functions, common pitfalls, performance degradation causes, and best practices for monitoring and maintaining SSDs in enterprise environments, ensuring reliability and data integrity.

HardwareOperationsSSD
0 likes · 15 min read
Understanding SSD Basics: Principles, Architecture, Risks, and Maintenance
21CTO
21CTO
Nov 12, 2015 · Cloud Computing

Inside JD.com's 11.11 Tech: Cloud, AI, and Ops Strategies

JD.com’s senior engineers detail how a combination of massive Docker‑based cloud migration, multi‑center transaction architecture, intensive 60‑second recovery drills, and AI‑driven personalization via the JD Brain enabled the platform to handle the unprecedented traffic and data demands of the 11.11 shopping festival.

Operationsartificial intelligencecloud computing
0 likes · 19 min read
Inside JD.com's 11.11 Tech: Cloud, AI, and Ops Strategies
Efficient Ops
Efficient Ops
Nov 11, 2015 · Operations

Mastering Ops Team Leadership: Manager Roles & Performance Management

This article explores how operations managers can define their positioning, adopt multiple leadership roles, and implement effective performance management practices—including clear goal setting, task allocation, coaching, and systematic review—to boost team efficiency and stability.

IT opsLeadershipOperations
0 likes · 12 min read
Mastering Ops Team Leadership: Manager Roles & Performance Management
Efficient Ops
Efficient Ops
Nov 4, 2015 · Operations

From Idea to Published Book: My Journey Writing the Puppet Authority Guide

This article shares the author's personal journey of conceiving, planning, writing, and publishing a technical book on Puppet, detailing the motivations, step‑by‑step process, challenges, case studies, and the professional benefits gained from turning the writing effort into tangible value.

Configuration ManagementOperationsPuppet
0 likes · 15 min read
From Idea to Published Book: My Journey Writing the Puppet Authority Guide
Qunar Tech Salon
Qunar Tech Salon
Nov 3, 2015 · Operations

Meituan's Supply Chain System: Architecture, Challenges, and Automation

The article explains Meituan's supply chain (SCP) process, detailing its role in converting merchant agreements into electronic contracts, the complex data structures, flexible sales models, dynamic auditing, and how automation, workflow, and product‑center modeling address these challenges to dramatically reduce costs and improve efficiency.

O2OOperationsProduct Modeling
0 likes · 14 min read
Meituan's Supply Chain System: Architecture, Challenges, and Automation
Efficient Ops
Efficient Ops
Oct 29, 2015 · Cloud Computing

Mastering Production KVM Virtualization: CPU, Memory, Network & Storage Best Practices

This article shares practical production‑level KVM virtualization techniques, covering CPU binding and host‑passthrough, memory management, network optimization with Open vSwitch, storage choices, VM time drift handling, and resource limiting via CGroup, offering actionable insights for reliable, high‑performance virtualized environments.

KVMOperations
0 likes · 11 min read
Mastering Production KVM Virtualization: CPU, Memory, Network & Storage Best Practices
Efficient Ops
Efficient Ops
Oct 27, 2015 · Operations

How to Build a Practical Monitoring System for Small and Medium Enterprises

An in‑depth guide walks readers through building a comprehensive monitoring system for small‑to‑medium enterprises, covering hardware, system, application, network, security, traffic analysis, business metrics, log aggregation, automation, visualization, and practical integration with tools like Zabbix, IPMI, ELK, and Smokeping.

AutomationLog ManagementOperations
0 likes · 18 min read
How to Build a Practical Monitoring System for Small and Medium Enterprises
Efficient Ops
Efficient Ops
Oct 22, 2015 · Operations

Unlock Hidden Savings: Optimizing Multi‑Data Center Bandwidth Costs

This article examines the characteristics and billing models of multi‑data‑center networks, analyzes external traffic patterns, identifies challenges in optimizing Internet‑facing bandwidth, and proposes practical scheduling strategies to better utilize idle bandwidth and reduce carrier costs.

Multi-Data CenterOperationsTraffic Scheduling
0 likes · 13 min read
Unlock Hidden Savings: Optimizing Multi‑Data Center Bandwidth Costs
Efficient Ops
Efficient Ops
Oct 21, 2015 · Operations

Putting People First: Building a Human‑Centric Operations System

This article explores why operations teams must adopt a people‑centric mindset, outlines a three‑layer framework of framework, blood (processes & policies) and interface, and provides practical steps for improving processes, technology, and organization to boost efficiency and employee satisfaction.

AutomationITILOperations
0 likes · 13 min read
Putting People First: Building a Human‑Centric Operations System
MaGe Linux Operations
MaGe Linux Operations
Oct 21, 2015 · Operations

How JobCenter Transforms Distributed Task Scheduling in E‑Commerce

JobCenter is a distributed task coordination platform that replaces crontab with a unified scheduling, monitoring, and alerting system, enabling e‑commerce teams to manage thousands of web‑service‑based jobs, ensure reliable execution, and gain clear visibility into task performance.

AutomationDistributed SystemsOperations
0 likes · 7 min read
How JobCenter Transforms Distributed Task Scheduling in E‑Commerce
Efficient Ops
Efficient Ops
Oct 19, 2015 · Operations

Step-by-Step Guide to Installing and Using Clip Server and SDK on Linux

This article provides a comprehensive tutorial on installing the Clip Server (Apache, PHP, MySQL), configuring its virtual host, setting up the Clip SDK with Python, and using various Clip commands to manage IP relationships, all illustrated with command examples and screenshots.

CLIPInstallationLinux
0 likes · 12 min read
Step-by-Step Guide to Installing and Using Clip Server and SDK on Linux
Qunar Tech Salon
Qunar Tech Salon
Oct 18, 2015 · Operations

Nginx Plus TCP Load Balancing: Configuration, Principles, and Monitoring

This article explains how Nginx Plus’s commercial stream module enables TCP load balancing, detailing configuration steps, underlying routing algorithms, health monitoring, connection handling, and performance considerations, while comparing it to HTTP load balancing and other layer‑4 solutions.

BackendNginxOperations
0 likes · 8 min read
Nginx Plus TCP Load Balancing: Configuration, Principles, and Monitoring
Efficient Ops
Efficient Ops
Oct 15, 2015 · Operations

Is DevOps & Full‑Stack Hype Killing Developers? A Critical Analysis

The article critically examines how the DevOps and full‑stack trends, driven by startup culture, force developers to juggle multiple roles, leading to overwork, reduced focus, and higher costs, while also highlighting nine companies that successfully practice DevOps.

DevOpsOperationsSoftware Engineering
0 likes · 16 min read
Is DevOps & Full‑Stack Hype Killing Developers? A Critical Analysis
Efficient Ops
Efficient Ops
Oct 12, 2015 · Operations

Redis Cluster Migration Lessons: Real‑World Failures and Practical Solutions

This article recounts a series of July Redis incidents—including network‑card saturation, connection‑limit exhaustion, suspected split‑brain, Bgsave‑induced OOM, and master‑restart data loss—detailing the migration to Redis Cluster with a Smart Proxy, the challenges faced, and actionable remediation strategies.

ClusterOperationsSmart Proxy
0 likes · 13 min read
Redis Cluster Migration Lessons: Real‑World Failures and Practical Solutions
21CTO
21CTO
Sep 28, 2015 · Operations

Mastering Log Management: 16 Rules to Boost System Reliability

This article presents a comprehensive set of logging best‑practice rules—from defining log levels and classifications to using RequestIDs, monitoring alerts, and managing log size—aimed at improving system reliability, troubleshooting speed, and operational efficiency.

DebuggingLog ManagementOperations
0 likes · 23 min read
Mastering Log Management: 16 Rules to Boost System Reliability
21CTO
21CTO
Sep 26, 2015 · Operations

10 Proven Strategies to Boost Team Management Efficiency

This article outlines ten practical management principles—from clarifying structure and goals to visualizing work and demanding results—that help leaders improve team accountability, transparency, and continuous improvement in operational environments.

Continuous ImprovementOperationsteam management
0 likes · 6 min read
10 Proven Strategies to Boost Team Management Efficiency
Efficient Ops
Efficient Ops
Sep 23, 2015 · Operations

How Tencent Powers Millions with SET‑Based NoSQL Clusters

Tencent’s operations team explains how its SET‑based NoSQL clusters deliver ultra‑low latency, high availability, and seamless disaster recovery for billions of users, detailing deployment models, synchronization mechanisms, cost‑saving techniques, and the Data‑as‑Service approach that underpins its massive social platforms.

Cost OptimizationData as a ServiceDistributed Systems
0 likes · 12 min read
How Tencent Powers Millions with SET‑Based NoSQL Clusters
21CTO
21CTO
Sep 23, 2015 · Operations

How Tencent Scaled from QQ to Cloud: Key Lessons in Tech Operations

The article chronicles Tencent's technological evolution across four stages—from massive user growth to flexible platforms—highlighting the strategic use of efficient architectures, dynamic operations, and cloud services that enabled it to support billions of users while maintaining performance and reliability.

OperationsScalabilityTechnology Evolution
0 likes · 9 min read
How Tencent Scaled from QQ to Cloud: Key Lessons in Tech Operations
Efficient Ops
Efficient Ops
Sep 20, 2015 · Operations

From Internet Ops to Banking: Lessons on Data Center Challenges

In a candid Q&A, industry veterans discuss the fundamental differences between internet and traditional banking operations, share experiences transitioning between sectors, and outline strategies to eliminate difficult data‑center maintenance, highlighting risk‑focused versus growth‑driven approaches.

Data centerIT opsInfrastructure
0 likes · 14 min read
From Internet Ops to Banking: Lessons on Data Center Challenges
Efficient Ops
Efficient Ops
Sep 17, 2015 · Operations

How Google’s DevOps Culture Spreads Knowledge Across the Whole Company

The article explores Randy Shoup’s insights on Google’s DevOps model, focusing on Dr. Spear’s four capabilities—especially the ways high‑efficiency companies rapidly detect issues, swarm to resolve them, disseminate new knowledge company‑wide, and adopt a development‑led leadership approach.

DevOpsGoogleOperations
0 likes · 9 min read
How Google’s DevOps Culture Spreads Knowledge Across the Whole Company
Architects' Tech Alliance
Architects' Tech Alliance
Sep 7, 2015 · Operations

Managed Data Remote Replication with DD Boost and NetWorker

DD Boost allows backup applications to manage and simplify file replication between multiple Data Domain systems, and using NetWorker as an example, the article details the step‑by‑step replication workflow, optional low‑bandwidth and encryption features, and how remote restores are performed.

DDBoostDataDomainLowBandwidth
0 likes · 4 min read
Managed Data Remote Replication with DD Boost and NetWorker
MaGe Linux Operations
MaGe Linux Operations
Sep 7, 2015 · Operations

How to Install and Configure GitLab CE 7.9 on Ubuntu 14.04

This guide walks through downloading, installing, and configuring GitLab CE 7.9 on Ubuntu 14.04, covering nginx workarounds, whitelist setup for rack_attack, essential configuration changes, login credentials, and references to official documentation and update logs.

GitLabOperationsWhitelist
0 likes · 5 min read
How to Install and Configure GitLab CE 7.9 on Ubuntu 14.04
Java High-Performance Architecture
Java High-Performance Architecture
Sep 5, 2015 · Operations

How Does Nginx Detect Unhealthy Servers? Passive vs Active Health Checks Explained

NGINX determines server health through passive checks—stopping forwarding after failures—and active health checks, which periodically probe each backend using the health_check directive, allowing configuration of intervals, failure thresholds, custom URIs, and response matching criteria to ensure reliable load balancing.

Operationsactive checkhealth check
0 likes · 4 min read
How Does Nginx Detect Unhealthy Servers? Passive vs Active Health Checks Explained