Tagged articles
222 articles
Page 3 of 3
MaGe Linux Operations
MaGe Linux Operations
Feb 26, 2017 · Information Security

How We Traced and Stopped a UDP Flood Attack on an Oracle‑Tomcat Server

During the Chinese New Year a client’s Oracle‑Tomcat server was overwhelmed by massive UDP traffic, prompting a forensic investigation that uncovered a hidden Trojan, detailed command‑line analysis, iptables hardening, and the root cause of a weak SSH password left after a hardware upgrade.

Linux forensicsSSH Securityincident response
0 likes · 5 min read
How We Traced and Stopped a UDP Flood Attack on an Oracle‑Tomcat Server
Efficient Ops
Efficient Ops
Feb 20, 2017 · Information Security

Inside YY's Security Ops: Real-World Incident Stories and Architecture

This article shares YY's security operations journey, detailing real incident response scenarios, the evolution of their security infrastructure from 2012 onward, and the key factors considered when building a robust security ops system, including DDoS protection, WAF, vulnerability scanning, intrusion detection, and data‑driven automation.

DDoS protectionSecurity Operationsbig data analytics
0 likes · 24 min read
Inside YY's Security Ops: Real-World Incident Stories and Architecture
Efficient Ops
Efficient Ops
Feb 16, 2017 · Operations

Why a Missed DNS Renewal Shut Down Our Site—and How We Fixed It

A detailed post‑mortem recounts how a forgotten domain renewal caused a DNS outage, the frantic troubleshooting steps across teams, temporary work‑arounds like switching to Google DNS, and the lessons learned for future incident management.

DNSdomain managementincident response
0 likes · 13 min read
Why a Missed DNS Renewal Shut Down Our Site—and How We Fixed It
21CTO
21CTO
Feb 2, 2017 · Operations

What GitLab’s 300 GB Data Loss Teaches About Backup and Ops Discipline

The GitLab production database was mistakenly deleted during a manual fix, exposing gaps in backup strategies, PostgreSQL configuration, and operational practices, and prompting a detailed post‑mortem that highlights the need for automated recovery, proper tooling, and transparent incident handling.

Data lossDatabase BackupOperations
0 likes · 15 min read
What GitLab’s 300 GB Data Loss Teaches About Backup and Ops Discipline
dbaplus Community
dbaplus Community
Jan 25, 2017 · Information Security

Effective Server Security Incident Response: Step‑by‑Step Guide

When a production server is compromised, abrupt actions like pulling the plug can disrupt services, so this guide outlines an eight‑stage, evidence‑driven response process—including verification, on‑site preservation, containment, impact assessment, online analysis, backup, deep forensics, and reporting—plus real‑world case studies and concrete command examples.

Case StudyForensicsLinux
0 likes · 14 min read
Effective Server Security Incident Response: Step‑by‑Step Guide
ITPUB
ITPUB
Jan 17, 2017 · Information Security

How to Diagnose and Eradicate a Linux Trojan That Spikes Outbound Traffic

This article recounts a real‑world incident on an Ubuntu 12.04 server where massive outbound traffic was traced to a hidden trojan, detailing step‑by‑step investigation, identification of malicious processes, removal techniques, and preventive hardening measures.

Network MonitoringRootkitincident response
0 likes · 9 min read
How to Diagnose and Eradicate a Linux Trojan That Spikes Outbound Traffic
Nightwalker Tech
Nightwalker Tech
Nov 9, 2016 · Operations

Best Practices for Service Monitoring and Alerting in E‑commerce Systems

The discussion outlines essential service‑monitoring techniques—including health checks, JVM metrics, traffic and payment ring‑ratio analysis, client‑side exception tracking, third‑party CDN monitoring, alert thresholds, instrumentation via AOP or SDKs, and tooling such as Datadog, Zabbix, and the Elastic stack—to reliably detect and respond to incidents in e‑commerce environments.

Alertinge‑commerceincident response
0 likes · 10 min read
Best Practices for Service Monitoring and Alerting in E‑commerce Systems
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Oct 14, 2016 · Operations

Mastering Rapid Incident Response: An Ops Engineer’s 4‑Step Method

This guide teaches operations engineers a four‑step “Wang‑Wen‑Wen‑Qie” methodology—observing system health, listening via alerts, questioning changes, and pulse‑checking with profiling tools—to rapidly diagnose and resolve high‑impact incidents while maintaining clear communication and post‑mortem learning.

Operationsincident responsemonitoring
0 likes · 5 min read
Mastering Rapid Incident Response: An Ops Engineer’s 4‑Step Method
ITPUB
ITPUB
Sep 21, 2016 · Operations

How Tencent Cloud Recovered Lost Data in a 2‑Day Storage Crisis

In a two‑day incident, Tencent Cloud's CBS team diagnosed cell failures, implemented directed reads and a dual‑cell merge strategy, and restored three‑copy data integrity, while uncovering monitoring gaps and tool limitations that inform future storage operations.

Data Recoverycloud storagedistributed storage
0 likes · 9 min read
How Tencent Cloud Recovered Lost Data in a 2‑Day Storage Crisis
Efficient Ops
Efficient Ops
Sep 13, 2016 · Operations

How Google SRE Principles Compare Across Industries

This article, excerpted from the upcoming Chinese edition of “SRE: Google Site Reliability Engineering”, examines how Google’s SRE guiding philosophies—disaster planning, post‑mortem culture, automation, and data‑driven decision‑making—are adopted, adapted, or contrasted in sectors such as manufacturing, aerospace, nuclear, telecommunications, healthcare, and finance, highlighting key similarities, differences, and lessons for Google and the broader tech industry.

AutomationOperationsSRE
0 likes · 21 min read
How Google SRE Principles Compare Across Industries
Efficient Ops
Efficient Ops
May 30, 2016 · Information Security

Why Weak Passwords and Unpatched Redis Threaten Operational Security

The article explains how weak passwords, misconfigured services like Redis, careless port changes, and leaked data enable attackers to compromise servers and internal networks, illustrating each risk with real‑world case studies and offering practical mitigation advice for robust ops security.

Redis vulnerabilitydata breachincident response
0 likes · 11 min read
Why Weak Passwords and Unpatched Redis Threaten Operational Security
dbaplus Community
dbaplus Community
May 27, 2016 · Databases

How to Keep DBA Operations Error‑Free: 5 Essential Practices

This article shares practical DBA advice—pre‑operation preparation, thorough fault analysis, effective communication, mandatory backups, and post‑incident reviews—to help database administrators maintain stability and avoid costly mistakes during online operations.

DBAbest practicesincident response
0 likes · 7 min read
How to Keep DBA Operations Error‑Free: 5 Essential Practices
MaGe Linux Operations
MaGe Linux Operations
Apr 29, 2016 · Information Security

How to Analyze and Recover from a Linux Rootkit Intrusion

This article walks through a real-world Linux server compromise, detailing the attack symptoms, forensic analysis steps, rootkit discovery, exploitation of an Awstats script vulnerability, and practical remediation measures to restore and harden the affected system.

AwstatsForensicsLinux
0 likes · 14 min read
How to Analyze and Recover from a Linux Rootkit Intrusion
Efficient Ops
Efficient Ops
Dec 16, 2015 · Operations

Mastering Ops Team Communication and Process Standards for Effective Management

This article outlines practical communication techniques, environment choices, active listening, and emotional control for operations teams, then details how to define standards, build robust processes, and ensure reliable business continuity through clear responsibilities, monitoring, automation, and continuous improvement.

Process Standardsincident responseoperations best practices
0 likes · 11 min read
Mastering Ops Team Communication and Process Standards for Effective Management
ITPUB
ITPUB
Nov 16, 2015 · Information Security

5 Hidden Signs Your Web Application Is Compromised and How to Respond

The article outlines five subtle indicators of a web application breach—abnormal behavior, irregular logs, unexpected processes or users, file modifications, and warning messages—while offering practical monitoring and remediation steps to help security teams detect and mitigate attacks early.

Web Securityapplication monitoringincident response
0 likes · 7 min read
5 Hidden Signs Your Web Application Is Compromised and How to Respond
Efficient Ops
Efficient Ops
Jul 6, 2015 · Operations

How to Tame “Thorny” Employees Without Undermining Your Ops Team

This article explores the characteristics, causes, and practical strategies for managing difficult or “thorny” team members in operations, offering case studies and step‑by‑step recommendations to mitigate risks while maintaining team performance.

Leadershipemployee performanceincident response
0 likes · 9 min read
How to Tame “Thorny” Employees Without Undermining Your Ops Team
MaGe Linux Operations
MaGe Linux Operations
Apr 24, 2015 · Operations

10 Proven Fault Management Practices Every Ops Team Should Master

This guide shares ten practical fault‑management techniques—ranging from proactive attitude and prioritizing incidents to continuous follow‑up and team collaboration—to help operations teams reduce damage, maintain service reliability, and keep users engaged during outages.

Operationsbest practicesfault management
0 likes · 8 min read
10 Proven Fault Management Practices Every Ops Team Should Master
MaGe Linux Operations
MaGe Linux Operations
Aug 19, 2014 · Operations

Essential Linux Security Checklist: 11 Steps to Detect Compromise

This guide provides a comprehensive 11‑step Linux security inspection checklist, covering account verification, log analysis, process and file checks, package integrity, network monitoring, scheduled tasks, backdoor detection, kernel modules, services, and rootkit scanning to help identify system compromises.

Operationsincident responselinux-commands
0 likes · 5 min read
Essential Linux Security Checklist: 11 Steps to Detect Compromise