Tagged articles
3281 articles
Page 29 of 33
Efficient Ops
Efficient Ops
Aug 21, 2017 · Operations

How AI-Driven Automation Transforms Tencent Game Operations

This article explains how Tencent Game operations moved from manual, threshold‑based monitoring to an AI‑powered, data‑driven workflow that automates scaling, improves online‑curve monitoring, enables full‑dimensional analysis, and reduces time, labor, and cost while enhancing player experience.

GamingOperationsautomation
0 likes · 16 min read
How AI-Driven Automation Transforms Tencent Game Operations
DevOps
DevOps
Aug 20, 2017 · Operations

DevOps Practices on the Telad Cloud Platform

This article explains the DevOps methodology, its goals of rapid high‑quality software delivery and cost reduction, and details how the Telad Cloud Platform implements end‑to‑end automation—including CI, automated testing, packaging, deployment, and continuous delivery using Microsoft TFS and custom tools.

DevOpsOperationsautomation
0 likes · 9 min read
DevOps Practices on the Telad Cloud Platform
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 18, 2017 · Operations

iQiyi Video Buffering Analysis and Handling Experience

iQiyi monitors video buffering across millions of users, classifies anomalies into internal, server, operator, and user causes, uses a buffer perception system with clustering and SVM predictions, automates multi‑dimensional alerts, and resolves over 93% of non‑operator incidents within 15 minutes.

BufferingOperationsmonitoring systems
0 likes · 18 min read
iQiyi Video Buffering Analysis and Handling Experience
Efficient Ops
Efficient Ops
Aug 16, 2017 · Operations

How Qunar Built an Automated Hardware Operations Platform to Boost Efficiency

This article details Qunar's end‑to‑end hardware automation system, covering background challenges, lifecycle management, automated testing, data collection, fault detection, and visualized monitoring, and explains how the integrated platform reduces manual effort, improves reliability, and cuts operational costs.

CMDBOperationsfault management
0 likes · 22 min read
How Qunar Built an Automated Hardware Operations Platform to Boost Efficiency
Efficient Ops
Efficient Ops
Aug 13, 2017 · Operations

22 Essential Ops Manager Tips for Building Resilient Web Infrastructure

This article compiles 22 practical recommendations from an operations manager covering domain management, CDN usage, image servers, data center selection, monitoring, security, redundancy, high‑availability architecture, disaster‑recovery planning, and team coordination to help ensure stable and secure online services.

InfrastructureOperationsdisaster recovery
0 likes · 12 min read
22 Essential Ops Manager Tips for Building Resilient Web Infrastructure
MaGe Linux Operations
MaGe Linux Operations
Aug 11, 2017 · Operations

Why Operations Matters: Beyond Automation to Real Business Value

In this reflective piece, Zhao Cheng (aka Qianyi) shares his experience managing the operations team at Mogujie, argues that operations value extends beyond automation to efficiency, stability, security, cost, and user experience, and offers practical guidance for shifting mindsets and aligning ops with business goals.

Cost ManagementDevOpsOperations
0 likes · 12 min read
Why Operations Matters: Beyond Automation to Real Business Value
dbaplus Community
dbaplus Community
Aug 8, 2017 · Operations

Mastering Smooth and Gray Releases for Large‑Scale Internet Finance Platforms

This article details a step‑by‑step transformation of an internet finance platform's online release process, covering application architecture, public component selection, smooth deployment techniques using Dubbo weight adjustment, RocketMQ control, LTS task isolation, verification methods, and a comprehensive gray‑release strategy with practical pitfalls and future improvements.

OperationsRocketMQgray release
0 likes · 16 min read
Mastering Smooth and Gray Releases for Large‑Scale Internet Finance Platforms
DevOps
DevOps
Aug 8, 2017 · Operations

A Decade of DevOps: History, Challenges, and the Road Ahead

Reflecting on ten years of DevOps, this article traces its origins, examines enduring obstacles such as reliability, coordination, and cultural resistance, highlights early success stories like Flickr, and argues that the future of DevOps depends on solid toolchains rather than abstract cultural shifts.

CultureDevOpsOperations
0 likes · 11 min read
A Decade of DevOps: History, Challenges, and the Road Ahead
21CTO
21CTO
Aug 8, 2017 · Backend Development

How Ctrip Evolved Its Architecture: Lessons from 5+ Iterations

This article chronicles Ctrip's multi‑year architectural evolution—covering operations, framework, application layers, publishing system, configuration management, SOA, and the large‑scale UserProfile project—highlighting the motivations, challenges, and solutions that shaped its high‑availability, high‑performance platform.

CtripOperationsarchitecture
0 likes · 13 min read
How Ctrip Evolved Its Architecture: Lessons from 5+ Iterations
Architecture Digest
Architecture Digest
Aug 7, 2017 · Operations

Website Availability and High‑Availability Architecture Overview

This article explains website availability metrics, fault‑weight scoring, layered high‑availability architecture, session management strategies, reusable service design, data redundancy, quality assurance processes, and monitoring practices essential for maintaining reliable large‑scale web systems.

AvailabilityOperationsSession Management
0 likes · 9 min read
Website Availability and High‑Availability Architecture Overview
Efficient Ops
Efficient Ops
Aug 4, 2017 · Operations

How Tencent’s ZhiYun Platform Powered the “Military Photo” Campaign with 4,000 Servers

This article details how Tencent's SNG operations team leveraged the ZhiYun intelligent operations platform—through standardized processes, massive IaaS provisioning, CMDB management, automated workflows, and real‑time capacity monitoring—to support the high‑traffic “Military Photo” H5 campaign, scaling up to 4,000 servers and 24 GB bandwidth.

CMDBIaSOperations
0 likes · 10 min read
How Tencent’s ZhiYun Platform Powered the “Military Photo” Campaign with 4,000 Servers
DevOps
DevOps
Aug 2, 2017 · Operations

Executive Insights on the State of DevOps: Findings from 16 Leaders Across 14 Companies

Based on interviews with 16 senior executives from 14 companies, this article highlights that DevOps success hinges on people, process, and technology, emphasizing cultural change, automation, faster releases, higher quality, and the growing impact of cloud and containers on future development practices.

Continuous DeliveryCultureDevOps
0 likes · 9 min read
Executive Insights on the State of DevOps: Findings from 16 Leaders Across 14 Companies
Efficient Ops
Efficient Ops
Aug 2, 2017 · Operations

Essential Ops Playbook: 6 Key Practices to Prevent Disasters

Drawing from a year‑and‑a‑half of ops experience, this guide outlines six practical categories—online operation standards, data handling, security, daily monitoring, performance tuning, and mindset—to help engineers avoid costly mistakes and maintain stable, secure systems.

BackupOperationsSystem Administration
0 likes · 12 min read
Essential Ops Playbook: 6 Key Practices to Prevent Disasters
21CTO
21CTO
Jul 27, 2017 · Backend Development

How Sina’s News Comment System Scaled to Millions of Users: Lessons from 3.0 to 5.0

This article chronicles the evolution of Sina's news comment platform from its early Perl‑based prototype through versions 3.0, 4.0, and 5.0, detailing architectural choices, caching strategies, database sharding, asynchronous processing, and the eventual migration to cloud‑native Python services to handle massive traffic spikes.

Comment SystemOperationsScalability
0 likes · 19 min read
How Sina’s News Comment System Scaled to Millions of Users: Lessons from 3.0 to 5.0
DevOps
DevOps
Jul 24, 2017 · Operations

Understanding ChatOps: Concepts, Ecosystem, and Practical Guidance

This article introduces ChatOps, explains its relationship to DevOps, reviews popular open‑source implementations such as Hubot, Lita and Err, and provides practical advice on robot integration, command design, common pitfalls, and how to build a collaborative chat‑based operations workflow.

ChatOpsChatbotDevOps
0 likes · 10 min read
Understanding ChatOps: Concepts, Ecosystem, and Practical Guidance
MaGe Linux Operations
MaGe Linux Operations
Jul 23, 2017 · Operations

Master Linux: The Ultimate Book List for System Admins and Cloud Engineers

This article curates a comprehensive list of essential Linux and related technology books, ranging from beginner guides and system fundamentals to advanced topics like kernel development, cloud computing, security, and automation, helping aspiring sysadmins and engineers build a solid knowledge foundation.

Learning ResourcesLinuxOperations
0 likes · 5 min read
Master Linux: The Ultimate Book List for System Admins and Cloud Engineers
Ctrip Technology
Ctrip Technology
Jul 20, 2017 · Operations

Ctrip's Fourth‑Generation Architecture: Elastic Routing (SLB) and the TARS Release System

This article reviews Ctrip's two‑year architecture transformation, describing how the company replaced hardware load balancers with a software‑defined SLB, introduced application‑level grouping, multi‑update mechanisms, health‑check sharing, monitoring, and the TARS release platform to achieve faster, more reliable deployments.

CtripInfrastructureOperations
0 likes · 16 min read
Ctrip's Fourth‑Generation Architecture: Elastic Routing (SLB) and the TARS Release System
Efficient Ops
Efficient Ops
Jul 18, 2017 · Operations

Boost NGINX Performance: Essential Linux and NGINX Tuning Tips

This guide explains how to fine‑tune Linux kernel parameters and NGINX directives—such as backlog queues, file descriptors, worker processes, keep‑alive settings, access‑log buffering, sendfile, and request limits—to achieve optimal web server performance for high‑traffic sites.

LinuxNginxOperations
0 likes · 11 min read
Boost NGINX Performance: Essential Linux and NGINX Tuning Tips
ITPUB
ITPUB
Jul 17, 2017 · Operations

Essential Linux Ops Tools Every Sysadmin Should Master

This guide outlines the core Linux system fundamentals, networking services, scripting languages, text‑processing utilities, database handling, firewall configuration, monitoring solutions, clustering, and backup techniques that form the essential toolkit for aspiring Linux operations engineers.

LinuxOperationsSysadmin
0 likes · 7 min read
Essential Linux Ops Tools Every Sysadmin Should Master
Architecture Digest
Architecture Digest
Jul 16, 2017 · Operations

Fault Governance in Distributed Systems: Dependency Failures, Strong/Weak Dependency, and Fault‑Injection Practices

This article presents a comprehensive overview of fault governance in large‑scale distributed systems, covering classic dependency failures, the concept of strong and weak dependencies, experimental observations, the evolution of fault‑injection techniques, and best practices for building reliable fault‑drill platforms.

Distributed SystemsOperationschaos engineering
0 likes · 20 min read
Fault Governance in Distributed Systems: Dependency Failures, Strong/Weak Dependency, and Fault‑Injection Practices
MaGe Linux Operations
MaGe Linux Operations
Jul 14, 2017 · Operations

Essential Python OS & File Operations for Automation

This guide presents a comprehensive collection of Python's os and shutil functions, file handling methods, and practical code examples to help operations engineers automate tasks, manage files and directories, and improve efficiency in modern IT environments.

OS moduleOperationsPython
0 likes · 11 min read
Essential Python OS & File Operations for Automation
DevOps
DevOps
Jul 13, 2017 · Operations

Reflections on the Phoenix Project DevOps Simulation Game

The author recounts attending a Phoenix Project‑based DevOps simulation in Beijing, describing role misunderstandings, the use of Kanban boards, bottleneck handling, one‑piece flow, and communication challenges, and concludes with recommendations on who should participate in such training.

DevOpsKanbanOperations
0 likes · 9 min read
Reflections on the Phoenix Project DevOps Simulation Game
Architecture Digest
Architecture Digest
Jul 13, 2017 · Operations

Comprehensive Architecture and DevOps Tool Knowledge Map

This article compiles an extensive collection of architecture knowledge maps and a detailed overview of DevOps tools, categorizing them by development, deployment, and maintenance functions while also presenting related big‑data and cloud‑computing skill maps for engineers seeking a holistic view of modern software infrastructure.

Big DataDevOpsOperations
0 likes · 9 min read
Comprehensive Architecture and DevOps Tool Knowledge Map
DevOps
DevOps
Jul 10, 2017 · Operations

Measuring Business Value and ROI of DevOps Adoption

The article explains how enterprises can quantify the commercial benefits and return on investment of DevOps by examining adoption statistics, business outcomes, success criteria for CEOs, CIOs and team leads, and key performance metrics that demonstrate the impact on cost, speed, and reliability.

DevOpsOperationsROI
0 likes · 5 min read
Measuring Business Value and ROI of DevOps Adoption
Efficient Ops
Efficient Ops
Jul 8, 2017 · Operations

What Can Martial Arts Teach Us About Modern Operations Engineering?

The article uses martial‑arts metaphors to explore how operations engineers should master fundamentals, combine inner knowledge with practical skills, leverage tools, embrace teamwork, and automate processes to deliver higher business availability with lower cost.

OperationsToolingautomation
0 likes · 16 min read
What Can Martial Arts Teach Us About Modern Operations Engineering?
MaGe Linux Operations
MaGe Linux Operations
Jul 7, 2017 · Operations

Essential Linux Command Cheat Sheet for Operations Engineers

This guide compiles essential Linux command‑line techniques—from searching and editing with vi/vim, using pipelines, file finding, string replacement, redirection, permission changes, to monitoring system resources and network connections—helping operations engineers boost productivity and maintain high service availability.

LinuxOperationsVim
0 likes · 12 min read
Essential Linux Command Cheat Sheet for Operations Engineers
Efficient Ops
Efficient Ops
Jul 6, 2017 · Operations

36 Ops Strategies: Permissions, Documentation, and Capacity Management

The article shares practical operations lessons—from periodic permission audits and thorough documentation to capacity monitoring, log rotation, and automation—illustrating how systematic practices and tooling can standardize and streamline IT infrastructure management.

DocumentationIT ManagementOperations
0 likes · 8 min read
36 Ops Strategies: Permissions, Documentation, and Capacity Management
Efficient Ops
Efficient Ops
Jul 5, 2017 · Operations

How Panda Live’s Rancho System Automates Secure, Scalable Deployments

Rancho is a unified release platform built for Panda Live that streamlines project onboarding, enforces multi‑layer security through SSO, user and project permissions, provides a web‑based front‑end and back‑end for tag selection, environment mapping, automated deployment, audit logging, and rollback, dramatically reducing release cycles.

DeploymentOperationsautomation
0 likes · 16 min read
How Panda Live’s Rancho System Automates Secure, Scalable Deployments
360 Quality & Efficiency
360 Quality & Efficiency
Jul 5, 2017 · Operations

Understanding JMeter Distributed Modes and Optimization Strategies

This article explains how JMeter implements its various distributed testing modes, examines the SampleSender interface and its overloads for standard, batch, and asynchronous modes, and presents optimization techniques that reduce network overhead and improve overall load‑testing performance.

Distributed TestingJMeterOperations
0 likes · 3 min read
Understanding JMeter Distributed Modes and Optimization Strategies
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jul 4, 2017 · Operations

How to Monitor and Predict Disk Health with SMART and smartctl

This article explains why disk health monitoring is crucial for service stability, introduces SMART technology and the smartctl tool, details command usage, key SMART attributes, value interpretation, and outlines automated data collection and alerting strategies for reliable operations.

Disk MonitoringOperationsSMART
0 likes · 15 min read
How to Monitor and Predict Disk Health with SMART and smartctl
Efficient Ops
Efficient Ops
Jul 2, 2017 · Operations

How to Build a Multi‑Layered Security Defense: Practical Ops Strategies

This article outlines a comprehensive, multi‑layered security framework for operations teams, covering policy design, dual‑account permission separation, grid‑based vulnerability management, topology and network safeguards, OS and database hardening, common misconceptions, and actionable principles for maintaining robust protection.

OperationsSystem Hardeningaccess control
0 likes · 31 min read
How to Build a Multi‑Layered Security Defense: Practical Ops Strategies
21CTO
21CTO
Jul 1, 2017 · Operations

How Ctrip Scales Its Architecture: Ops, Release, and Big Data Insights

This article outlines Ctrip’s evolving architecture—covering its operational backbone, framework components, release system, configuration management, SOA evolution, and the massive UserProfile big‑data platform—offering practical insights from a senior developer on how the company achieves high availability and scalability.

Big DataOperationsSOA
0 likes · 12 min read
How Ctrip Scales Its Architecture: Ops, Release, and Big Data Insights
DevOps
DevOps
Jun 28, 2017 · Operations

The Importance of Continuous Testing in DevOps

This article explains why continuous testing is essential for effective DevOps, describing its role in feedback loops, key performance metrics, testing frameworks, and how automated testing drives quality and rapid delivery across development, QA, and operations.

DevOpsOperationsautomation
0 likes · 5 min read
The Importance of Continuous Testing in DevOps
21CTO
21CTO
Jun 26, 2017 · Operations

From Lab Chemist to KVM Guru: A Veteran Ops Leader’s Journey

This interview chronicles Xiao Li’s unconventional path from a petroleum chemistry graduate to a senior operations director, highlighting his hands‑on experience with system administration, virtualization, cloud computing, and team management across major Chinese tech firms.

KVMOperationsProject Management
0 likes · 9 min read
From Lab Chemist to KVM Guru: A Veteran Ops Leader’s Journey
Efficient Ops
Efficient Ops
Jun 25, 2017 · Operations

Comprehensive Guide to Modern IT Operations: Roles, Responsibilities, and Evolution

This article outlines the service‑centric principles of internet operations, details the various categories of work such as system, application, database, and security operations, and traces the evolution of operational practices from manual management to automated, platform‑driven workflows.

OperationsSystem Administrationsecurity
0 likes · 19 min read
Comprehensive Guide to Modern IT Operations: Roles, Responsibilities, and Evolution
Efficient Ops
Efficient Ops
Jun 24, 2017 · Operations

How to Boost Your Ops Credibility: Certifications, Tools, and Culture Hacks

This guide outlines how operations engineers can elevate their professional image by pursuing key certifications, mastering deep technical topics, favoring niche tools over mainstream ones, writing scripts with awk/sed, embracing unconventional operating systems, and strategically networking within the industry.

LinuxNetworkingOperations
0 likes · 7 min read
How to Boost Your Ops Credibility: Certifications, Tools, and Culture Hacks
Continuous Delivery 2.0
Continuous Delivery 2.0
Jun 22, 2017 · Operations

Implementing Periodic Releases and Operational Automation for Small Teams

The article describes how a small development team adopts a three‑week periodic release cadence, improves demand management, resolves operational concerns, and standardizes configuration, environment, deployment, and testing processes to achieve continuous delivery with higher quality and lower coordination cost.

Configuration ManagementContinuous DeliveryOperations
0 likes · 13 min read
Implementing Periodic Releases and Operational Automation for Small Teams
Efficient Ops
Efficient Ops
Jun 20, 2017 · Operations

Unlocking Ops Value: How Tencent’s Fine‑Grained Technical Operations Drive Massive Savings

This article explores how Tencent’s operations team redefines its value by applying fine‑grained technical management to mobile internet challenges, capacity planning, bandwidth optimization, and data‑driven product decisions, ultimately delivering huge cost savings and turning operations into a core competitive advantage.

OperationsResource Optimizationbandwidth management
0 likes · 22 min read
Unlocking Ops Value: How Tencent’s Fine‑Grained Technical Operations Drive Massive Savings
DevOps
DevOps
Jun 19, 2017 · Operations

Barclays' Agile and DevOps Transformation in the FinTech Era

The article examines how Barclays bank undertook an 18‑month enterprise‑wide agile and DevOps transformation—spanning IT, HR, security, and all business units—to meet FinTech competition, improve speed, reliability, and support a large‑scale digital overhaul for the financial sector.

BankingDevOpsDigitalTransformation
0 likes · 7 min read
Barclays' Agile and DevOps Transformation in the FinTech Era
Efficient Ops
Efficient Ops
Jun 18, 2017 · Operations

Choosing the Right Bypass Monitoring Tool: Balancing Cost, Performance, and Value

The article examines the challenges of selecting hardware and monitoring tools, comparing Intel CPUs on performance versus price, and outlines criteria for evaluating bypass‑monitoring products, emphasizing cost, reliability, scalability, and data‑decoding capabilities to guide informed operational decisions.

OperationsPerformance Monitoringcost-benefit analysis
0 likes · 9 min read
Choosing the Right Bypass Monitoring Tool: Balancing Cost, Performance, and Value
Efficient Ops
Efficient Ops
Jun 15, 2017 · Operations

How Tencent Automated Operations for a Billion‑Red‑Packet Event

This article details Tencent's operation automation for the 2016 Chinese New Year QQ red‑packet activity, describing the massive traffic challenge, the architectural design, the shift from manual to CMDB‑driven one‑click scaling, load‑testing, flexible protection strategies, and on‑site monitoring that enabled rapid, reliable handling of billions of red‑packet transactions.

CMDBOperationsTencent
0 likes · 20 min read
How Tencent Automated Operations for a Billion‑Red‑Packet Event
DevOps
DevOps
Jun 14, 2017 · Operations

A Historical Overview of DevOps Evolution

This article traces the evolution of DevOps from its early roots in agile development and the challenges faced by developers and operations teams, through community formation, industry adoption, and the rise of cloud‑native technologies that have shaped modern continuous delivery practices.

CloudNativeContinuousDeliveryDevOps
0 likes · 7 min read
A Historical Overview of DevOps Evolution
Efficient Ops
Efficient Ops
Jun 14, 2017 · Operations

Scaling Alibaba's Operations: Inside StarAgent, Qingteng & Normandy

This article details Alibaba's evolution of its operations platform, describing the design, features, and performance of StarAgent, the Qingteng P2P file distribution system, and the Normandy application‑deployment platform, highlighting how these tools enable high‑availability, automation, and massive scalability across global data centers.

AlibabaDevOpsOperations
0 likes · 13 min read
Scaling Alibaba's Operations: Inside StarAgent, Qingteng & Normandy
Ctrip Technology
Ctrip Technology
Jun 13, 2017 · Operations

Evolution and Architecture of Ctrip's System: Operations, Frameworks, and Big Data

This article presents a comprehensive overview of Ctrip's evolving system architecture, detailing its operational strategies, framework components such as SOA and release systems, and the large‑scale UserProfile big‑data platform, illustrating how each iteration addressed prior challenges while introducing new capabilities.

Big DataCtripOperations
0 likes · 13 min read
Evolution and Architecture of Ctrip's System: Operations, Frameworks, and Big Data
MaGe Linux Operations
MaGe Linux Operations
Jun 11, 2017 · Operations

Essential Linux Ops Tools Every Sysadmin Should Master

This guide outlines the ten core toolsets—ranging from Linux basics and network services to scripting, firewalls, monitoring, clustering, and backup—that aspiring Linux operations engineers need to master for effective system administration.

BackupLinuxNetworking
0 likes · 7 min read
Essential Linux Ops Tools Every Sysadmin Should Master
Efficient Ops
Efficient Ops
Jun 11, 2017 · Operations

How Bilibili Scaled Its Ops: From DIY Deployments to Prometheus Monitoring

From early manual deployments to a sophisticated, multi-layered monitoring stack—including ELK, Zabbix, Statsd, Grafana, and Prometheus—Bilibili’s ops team shares the evolution, challenges, and lessons learned in building scalable, automated infrastructure for massive internet traffic.

DevOpsELKGrafana
0 likes · 8 min read
How Bilibili Scaled Its Ops: From DIY Deployments to Prometheus Monitoring
ITPUB
ITPUB
Jun 9, 2017 · Operations

Mastering Effective Monitoring: From Basics to the USE Method

This article explains the fundamentals of monitoring, distinguishes traditional OPS from SRE perspectives, defines monitoring objects and metrics, introduces quantitative thinking with SLI/SLO, and presents the USE method with a MySQL example to help engineers detect and prevent failures efficiently.

MetricsOperationsSLI
0 likes · 10 min read
Mastering Effective Monitoring: From Basics to the USE Method
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 1, 2017 · Operations

How Alibaba Engineers Capacity Planning and Full‑Link Load Testing for Massive Sales Events

This article explains Alibaba's four‑step capacity‑planning methodology, the various single‑machine load‑testing techniques, the design of a full‑link load‑testing platform for Double‑11, and the dynamic flow‑control framework that together ensure system stability during extreme traffic spikes.

AlibabaLoad TestingOperations
0 likes · 18 min read
How Alibaba Engineers Capacity Planning and Full‑Link Load Testing for Massive Sales Events
ITPUB
ITPUB
May 29, 2017 · Operations

Why df and du Show Different Disk Usage on Linux and How to Fix It

This article explains why the Linux commands df and du often report different disk usage figures, detailing three main causes—reserved space, phantom (deleted) files, and data present before mounting—and provides concrete commands and steps to identify and resolve each discrepancy.

FilesystemOperationsdf
0 likes · 4 min read
Why df and du Show Different Disk Usage on Linux and How to Fix It
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
May 25, 2017 · Operations

How HULK Turned Manual Ops into a Productized Cloud Platform

This article recounts how the 2013 HULK private‑cloud team evolved their operations from manual, repetitive tasks to a fully productized, automated platform, detailing two major upgrades—tooling automation and product‑oriented services—while sharing practical insights on monitoring, alarm management, and user‑centric design.

OperationsToolingcloud platform
0 likes · 12 min read
How HULK Turned Manual Ops into a Productized Cloud Platform
MaGe Linux Operations
MaGe Linux Operations
May 16, 2017 · Operations

How Distributed Clusters Achieve Load Balancing: Principles and Practices

This article explains the concepts of distributed clusters and load balancing, contrasting clusters and distributed systems with real‑world analogies, describing various load‑balancing techniques such as DNS, LVS, and reverse proxies, and offers practical guidance on designing simple, reliable, and efficient load‑balancing solutions for distributed back‑ends.

Distributed SystemsOperationsclusters
0 likes · 11 min read
How Distributed Clusters Achieve Load Balancing: Principles and Practices
DevOps
DevOps
May 16, 2017 · Operations

DevOps Evolution: Key Takeaways from Patrick Debois’ DevOpsDays Austin Slides

The article presents a visual recap of Patrick Debois' DevOpsDays Austin presentation, illustrating the history, culture, practices, challenges, and future directions of DevOps through a series of themed paintings and captions that highlight automation, measurement, feedback loops, and the human side of the movement.

Continuous DeliveryCultureDevOps
0 likes · 9 min read
DevOps Evolution: Key Takeaways from Patrick Debois’ DevOpsDays Austin Slides
ITPUB
ITPUB
May 15, 2017 · Operations

Mastering Online Incident Management: From Detection to Prevention

This article outlines a comprehensive methodology for handling large‑scale online service incidents, covering goals, the "jump‑fill‑avoid" framework, step‑by‑step processes for detection, diagnosis, remediation, and post‑mortem analysis, as well as essential monitoring, logging, and escalation infrastructure.

OperationsSREincident management
0 likes · 18 min read
Mastering Online Incident Management: From Detection to Prevention
Alibaba Cloud Developer
Alibaba Cloud Developer
May 12, 2017 · Operations

How Alibaba Engineers Fault Governance and Chaos Engineering for E‑commerce

This article recounts Alibaba's middleware team's QCon Beijing 2017 presentation on fault governance and fault‑drill practices, covering distributed‑system dependency failures, strong/weak dependency concepts, multi‑stage technical evolution, and the design of their chaos‑engineering platform for large‑scale e‑commerce.

AlibabaOperationschaos engineering
0 likes · 21 min read
How Alibaba Engineers Fault Governance and Chaos Engineering for E‑commerce
DevOps
DevOps
May 11, 2017 · Operations

Understanding DevOps: Principles, Differences from Traditional Models, Challenges, and Measurement

This article explains what DevOps is, contrasts it with traditional development‑operations workflows, discusses its benefits and drawbacks, outlines key challenges such as balancing efficiency with stability, responsibility allocation, and assessment, and presents four metrics for evaluating DevOps effectiveness.

DevOpsEngineering EfficiencyOperations
0 likes · 9 min read
Understanding DevOps: Principles, Differences from Traditional Models, Challenges, and Measurement
ITFLY8 Architecture Home
ITFLY8 Architecture Home
May 10, 2017 · Operations

Mastering F5 Load Balancer: Quick Guide to Hardware Overview and Web Configuration

This article introduces the widely used F5 load‑balancing appliance, detailing its front‑panel indicators, network interfaces, status LEDs, and step‑by‑step procedures for initial web‑based configuration, including default IP, login credentials, and essential system settings such as hostname and root password policies.

F5Load BalancerOperations
0 likes · 5 min read
Mastering F5 Load Balancer: Quick Guide to Hardware Overview and Web Configuration
DevOps
DevOps
May 9, 2017 · Operations

A Clear and Concise DevOps Implementation Framework: 11 Core Service Capabilities

This article introduces a straightforward DevOps implementation framework that maps eleven essential service capabilities across the software development lifecycle, explains why adopting DevOps is a multi‑year journey, and uses a fitness analogy to illustrate how enterprises can progressively build these capabilities.

Continuous DeliveryDevOpsOperations
0 likes · 4 min read
A Clear and Concise DevOps Implementation Framework: 11 Core Service Capabilities
JD Retail Technology
JD Retail Technology
May 9, 2017 · Backend Development

Node.js Deployment with Tomcat: Architecture Options and Step‑by‑Step Implementation

This article outlines the rationale for adopting Node.js in the 京友邦 project, compares two deployment architectures—separate Node and Tomcat services versus co‑locating them in a single Docker container—and provides detailed step‑by‑step instructions for packaging, scripting, Nginx configuration, and monitoring to achieve a successful rollout.

BackendDeploymentDocker
0 likes · 8 min read
Node.js Deployment with Tomcat: Architecture Options and Step‑by‑Step Implementation
DevOps
DevOps
May 7, 2017 · Operations

Understanding Agile, Continuous Integration, DevOps, and Continuous Delivery: Concepts, Relationships, and Practical Guidance

The article explains Agile software development, Continuous Integration, DevOps, and Continuous Delivery, examines their inter‑relationships from both technical and human perspectives, and offers practical steps, maturity models, and real‑world case insights for teams seeking faster, reliable software delivery.

Continuous DeliveryDevOpsOperations
0 likes · 12 min read
Understanding Agile, Continuous Integration, DevOps, and Continuous Delivery: Concepts, Relationships, and Practical Guidance
DevOps
DevOps
May 4, 2017 · Operations

What Security Teams Can Learn from DevOps to Build a Secure Architecture

This article explains how security professionals can adopt DevOps practices—such as cross‑functional collaboration, continuous delivery, and visualized security status—to build a resilient security architecture that aligns with agile development and reduces risk through frequent, small releases.

Continuous DeliveryDevOpsOperations
0 likes · 7 min read
What Security Teams Can Learn from DevOps to Build a Secure Architecture
Efficient Ops
Efficient Ops
May 3, 2017 · Operations

How Tencent Scales NBA Live Streams to Millions: Behind the Tech and Operations

This article details Tencent's large‑scale live streaming architecture for NBA games, covering the rapid growth of live video, key technical features, network transmission challenges, multi‑angle production, CDN deployment, monitoring, big‑data processing, and strategies for ensuring low latency and high reliability for millions of concurrent viewers.

Big DataCDNOperations
0 likes · 25 min read
How Tencent Scales NBA Live Streams to Millions: Behind the Tech and Operations
Continuous Delivery 2.0
Continuous Delivery 2.0
May 1, 2017 · Operations

Implementing Periodic Releases: Strategies, Challenges, and Automation in Software Development

The article describes how a development team transitioned to short‑cycle, periodic releases, outlining the goals, benefits, operational concerns, and a comprehensive set of improvements—including testing strategy, configuration and environment management, and automated deployment pipelines—to maintain quality while increasing release frequency.

Configuration ManagementContinuous DeliveryOperations
0 likes · 14 min read
Implementing Periodic Releases: Strategies, Challenges, and Automation in Software Development
dbaplus Community
dbaplus Community
Apr 27, 2017 · Big Data

Why Kafka’s __consumer_offsets Topic Can Fill Your Disk and How to Fix It

The article explains Kafka’s default consumer offset storage mechanism, why the __consumer_offsets system topic can consume massive disk space due to frequent synchronous commits and misconfigured cleanup, and outlines practical steps to reduce offset data and enable proper log compaction.

Consumer OffsetOffset ManagementOperations
0 likes · 6 min read
Why Kafka’s __consumer_offsets Topic Can Fill Your Disk and How to Fix It
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 27, 2017 · Operations

How Shared Thinking Is Reshaping Data Center Infrastructure and Business Models

The article examines how the shared‑economy mindset is transforming data‑center infrastructure—from modular designs and integrated solutions to new business models—driving lower costs, higher efficiency, and a shift from construction‑focused to operation‑focused competition across the entire ecosystem.

Industry AnalysisOperationsdata center
0 likes · 12 min read
How Shared Thinking Is Reshaping Data Center Infrastructure and Business Models
Efficient Ops
Efficient Ops
Apr 26, 2017 · Operations

Unlock Nginx: Reverse Proxy, Load Balancing & Static Serving Without Add‑ons

This article explains how Nginx can function as a reverse proxy, load balancer, HTTP server with static‑file handling, and forward proxy without relying on third‑party modules, providing configuration examples and discussing built‑in load‑balancing strategies such as round‑robin, weight, ip_hash, fair, and url_hash.

ConfigurationHTTP serverNginx
0 likes · 11 min read
Unlock Nginx: Reverse Proxy, Load Balancing & Static Serving Without Add‑ons
DevOps
DevOps
Apr 25, 2017 · Operations

Analyzing and Visualizing Docker Logs with the ELK Stack (Part Two)

This article explains how to analyze and visualize Docker container logs using the ELK stack, covering preparation, parsing tips, Kibana query techniques, and example visualizations to help monitor Dockerized environments effectively in production.

DockerELKKibana
0 likes · 7 min read
Analyzing and Visualizing Docker Logs with the ELK Stack (Part Two)
dbaplus Community
dbaplus Community
Apr 23, 2017 · Operations

From Legacy to Scalable: How TianpiaoChe Revamped Its Ops Architecture

Li Qiang, Operations Director at TianpiaoChe, shares the step‑by‑step transformation of a legacy e‑commerce infrastructure, covering network latency fixes, hardware re‑allocation, OS tuning, open‑source component upgrades, virtualization changes, and future plans, providing practical insights for large‑scale site operations.

DevOpsOperationsVirtualization
0 likes · 28 min read
From Legacy to Scalable: How TianpiaoChe Revamped Its Ops Architecture
Meituan Technology Team
Meituan Technology Team
Apr 21, 2017 · Operations

Meituan-Dianping DevOps Automation Practices and Philosophy

The Meituan‑Dianping technical salon showcases its DevOps automation philosophy by presenting three core tools—DB automation platform, service tree, and Puppet web management—while also featuring Shanghai Zhaogang Network’s CMDB experience, illustrating how rapid O2O growth drives the need for fast, reliable, and scalable operational automation.

CMDBDevOpsMeituan-Dianping
0 likes · 5 min read
Meituan-Dianping DevOps Automation Practices and Philosophy
Baidu Waimai Technology Team
Baidu Waimai Technology Team
Apr 20, 2017 · Databases

Greenplum (GPDB) Architecture, Features, and Operational Tools Overview

This article explains Greenplum's MPP architecture, master‑segment design, high‑availability, interconnect network, rich management tools, parallel query planning, data loading techniques, and additional capabilities such as LDAP authentication and resource queues, demonstrating why it is a strong next‑generation big‑data query engine.

Big DataGreenplumMPP
0 likes · 15 min read
Greenplum (GPDB) Architecture, Features, and Operational Tools Overview
Qunar Tech Salon
Qunar Tech Salon
Apr 19, 2017 · Backend Development

Rate Limiting Strategies for API Services: Design, Implementation, and Load Shedding

This article explains why availability and reliability are critical for web APIs, outlines four common rate‑limiting techniques used at Stripe, describes how to choose and implement request, concurrent, usage‑based, and worker‑utilization limiters, and provides practical guidance for safely deploying them in production.

APIOperationsToken Bucket
0 likes · 11 min read
Rate Limiting Strategies for API Services: Design, Implementation, and Load Shedding
Baidu Waimai Technology Team
Baidu Waimai Technology Team
Apr 18, 2017 · Industry Insights

Baidu Waimai’s Cloud Migration, AI Logistics, and Architecture – QCon 2017

At QCon Beijing 2017, three senior Baidu Waimai engineers detailed the company’s year‑long migration from IDC to cloud using custom operation platforms, described the AI‑driven, data‑rich logistics scheduling system that outperforms manual dispatch, and shared architectural evolutions that enabled rapid, zero‑downtime scaling of the fast‑growing delivery business.

AI logisticsBig DataOperations
0 likes · 5 min read
Baidu Waimai’s Cloud Migration, AI Logistics, and Architecture – QCon 2017
Hulu Beijing
Hulu Beijing
Apr 18, 2017 · Operations

How Hulu Scales Live Streaming: Challenges and Key Technologies

The article details Hulu's evolution from a simple web video service to a multi‑device platform, highlighting the scalability, micro‑service architecture, DASH streaming, and comprehensive quality monitoring that enable consistent live streaming experiences across diverse US devices.

DASHHuluMicroservices
0 likes · 6 min read
How Hulu Scales Live Streaming: Challenges and Key Technologies
Efficient Ops
Efficient Ops
Apr 18, 2017 · Operations

Boost Mobile Game Performance: Ops, Download & Real‑Time Network Hacks

This article outlines a comprehensive solution for mobile game operations, covering the value of modern ops, user‑experience metrics across download, login, gameplay, payment and sentiment, download‑service optimizations such as domain and resource hijack protection, incremental updates, and real‑time battle network enhancements including access‑network, backbone and QoS techniques.

Download OptimizationMobile GamingOperations
0 likes · 23 min read
Boost Mobile Game Performance: Ops, Download & Real‑Time Network Hacks
Continuous Delivery 2.0
Continuous Delivery 2.0
Apr 16, 2017 · Operations

Baidu's Traditional Application Operations and Branch Management Process

The article explains Baidu's traditional project branch management approach, the reasons behind mainline release queues, and summarizes the team's continuous delivery transformation, highlighting clear goals, transparent planning, self‑defined processes, story‑driven development, six‑step CI, and automated testing practices.

BaiduBranch ManagementContinuous Delivery
0 likes · 6 min read
Baidu's Traditional Application Operations and Branch Management Process
ITPUB
ITPUB
Apr 15, 2017 · Operations

How to Configure Nginx Load Balancing with Multiple Tomcat Instances on Windows

This step‑by‑step guide shows how to prepare two Tomcat servers, create a simple web project, configure Nginx as a reverse‑proxy load balancer with various strategies, start the services on Windows, and verify that requests are distributed across the Tomcat instances.

BackendNginxOperations
0 likes · 6 min read
How to Configure Nginx Load Balancing with Multiple Tomcat Instances on Windows
21CTO
21CTO
Apr 13, 2017 · Operations

Mastering Internet Performance Engineering and Capacity Planning

This article presents a comprehensive methodology for internet performance engineering, covering non‑functional quality goals, detailed metrics for application servers, databases, caches and message queues, a practical technical review outline, and a real‑world capacity‑planning case study with both maximal and minimal resource solutions.

Backend ArchitectureNon-functional RequirementsOperations
0 likes · 24 min read
Mastering Internet Performance Engineering and Capacity Planning
Architecture Digest
Architecture Digest
Apr 13, 2017 · Operations

Methodology for Internet Architecture Technical Review and Capacity/Performance Evaluation

This article presents a comprehensive methodology for reviewing internet‑scale system architectures, focusing on non‑functional quality attributes such as performance, availability, scalability, security, and maintainability, and provides detailed guidelines, metrics tables, and a classic case study for capacity and performance planning.

BackendNon-functional RequirementsOperations
0 likes · 27 min read
Methodology for Internet Architecture Technical Review and Capacity/Performance Evaluation
Efficient Ops
Efficient Ops
Apr 12, 2017 · Operations

Mastering Enterprise Monitoring: From Basics to Advanced Toolchains

This comprehensive guide explains why monitoring is vital for operations, outlines clear objectives and methods, compares popular open‑source and commercial tools, details a Zabbix‑based workflow, and covers hardware, system, application, network, security, API, performance, and business metrics with practical alerting strategies.

AlertingOperationsZabbix
0 likes · 21 min read
Mastering Enterprise Monitoring: From Basics to Advanced Toolchains