Tagged articles

Ops

131 articles · Page 2 of 2
Programmer DD
Programmer DD
Feb 15, 2020 · Operations

Understanding Prometheus: Architecture, Data Model, and Alerting Explained

This article provides a comprehensive overview of Prometheus, covering its open‑source monitoring architecture, multi‑dimensional data model, query language, storage mechanisms, service discovery, alerting workflow with Alertmanager, and visualization using Grafana, all illustrated with key diagrams and configuration examples.

AlertingMetricsOps
0 likes · 9 min read
Understanding Prometheus: Architecture, Data Model, and Alerting Explained
dbaplus Community
dbaplus Community
Sep 4, 2019 · Operations

Running Kafka on Kubernetes: Practical Tips, Pitfalls, and Best Practices

This guide explains how to run Kafka on Kubernetes, covering runtime resource needs, storage considerations, network requirements, configuration with Pods, StatefulSets, Helm charts and Operators, performance testing, monitoring, logging, health checks, rolling updates, scaling, and backup strategies.

Opshelmkafka
0 likes · 12 min read
Running Kafka on Kubernetes: Practical Tips, Pitfalls, and Best Practices
MaGe Linux Operations
MaGe Linux Operations
Dec 30, 2018 · Operations

Step‑by‑Step Guide to Building an ELK Stack on CentOS 6.7

This tutorial walks you through setting up Java, ElasticSearch 2.1.0, Logstash 2.1.1, Kibana 4.3.1, and NGINX on a CentOS 6.7 server, configuring each component, linking them together, and troubleshooting common time‑zone issues so you can visualize logs with Kibana.

CentOSELKElasticsearch
0 likes · 8 min read
Step‑by‑Step Guide to Building an ELK Stack on CentOS 6.7
dbaplus Community
dbaplus Community
Dec 11, 2018 · Databases

How We Fixed MongoDB Outages and Boosted Performance in Production

This article outlines MongoDB's key features, describes a real‑world outage caused by misconfigured connection limits, details the root‑cause analysis and temporary remediation, and presents a comprehensive set of configuration, sharding, and hardware optimizations that dramatically improved the system's reliability and throughput.

ConfigurationMongoDBOps
0 likes · 14 min read
How We Fixed MongoDB Outages and Boosted Performance in Production
dbaplus Community
dbaplus Community
Dec 10, 2018 · Databases

How to Run Percona MongoDB HotBackup with a Simple PHP Script

This guide explains why the community edition of MongoDB lacks native hot backup, how Percona MongoDB adds online backup support, the underlying backup and restore principles, and provides a step‑by‑step PHP script with environment setup, configuration, execution, and scheduling instructions.

HotBackupMongoDBOps
0 likes · 7 min read
How to Run Percona MongoDB HotBackup with a Simple PHP Script
37 Interactive Technology Team
37 Interactive Technology Team
May 25, 2018 · Operations

Optimization and Redesign of Open-Falcon Monitoring System for the 37 Monitoring Platform

The project redesigns the Open‑Falcon monitoring system for the 37 platform by integrating it with the existing CMDB, adding distributed‑lock high‑availability for judge and alarm modules, optimizing cross‑region agent data transmission, fixing timezone inconsistencies, and enabling redundant query/graph services, thereby unifying disparate monitoring tools into a scalable, reliable solution.

CMDBOpen-FalconOps
0 likes · 11 min read
Optimization and Redesign of Open-Falcon Monitoring System for the 37 Monitoring Platform
dbaplus Community
dbaplus Community
Jan 25, 2018 · Cloud Native

How to Build a Lightweight Private Cloud with Docker and Ansible

This article explains the challenges of lightweight private‑cloud deployment, classifies distributed‑system types, and presents a practical solution that combines a standard OS layer, Docker containerization, and Ansible automation, illustrated with a real‑world RabbitMQ cluster example and supporting GitHub resources.

AnsibleDockerOps
0 likes · 18 min read
How to Build a Lightweight Private Cloud with Docker and Ansible
Efficient Ops
Efficient Ops
Nov 28, 2017 · Fundamentals

Master Python Fast: A Practical Guide for Ops Engineers to Automate Tasks

This comprehensive tutorial walks operations engineers through Python fundamentals, from installation and basic syntax to data structures, functions, modules, and debugging, illustrating each concept with clear examples and diagrams to enable rapid automation development in real‑world DevOps environments.

AutomationOps
0 likes · 31 min read
Master Python Fast: A Practical Guide for Ops Engineers to Automate Tasks
MaGe Linux Operations
MaGe Linux Operations
Oct 24, 2017 · Operations

Top 20 Python Libraries Every Sysadmin Should Know

This article lists and briefly describes twenty essential Python libraries—from psutil and Ansible to SaltStack and scapy—that empower system administrators to monitor resources, automate tasks, manage configurations, and build robust DevOps workflows.

Opsdevopslibraries
0 likes · 5 min read
Top 20 Python Libraries Every Sysadmin Should Know
Efficient Ops
Efficient Ops
Sep 23, 2017 · Operations

Why Ops Teams Feel Stuck: 6 Common Pitfalls and How to Fix Them

The article explores why operations professionals often feel exhausted, unrecognized, and low, identifying six systemic shortcomings—lack of a holistic ops framework, unclear positioning, closed mindset, insufficient authority, stagnant improvement, and missing cultural integration—and offers practical guidance to transform these weaknesses into strengths.

Opsteam culture
0 likes · 8 min read
Why Ops Teams Feel Stuck: 6 Common Pitfalls and How to Fix Them
Efficient Ops
Efficient Ops
Sep 10, 2017 · Operations

How We Built a Scalable, High‑Availability Monitoring Platform with Service Trees

This article details the challenges of traditional monitoring systems, the design and implementation of a custom high‑availability monitoring platform using a Golang‑based service tree, Raft‑backed storage, InfluxDB for time‑series data, and a modular architecture that supports Windows agents, third‑party reporting, and AI‑driven future enhancements.

AIOpsInfluxDBMonitoring
0 likes · 13 min read
How We Built a Scalable, High‑Availability Monitoring Platform with Service Trees
DevOps
DevOps
Jul 12, 2017 · Cloud Native

Container Monitoring: Challenges, Metrics Collection, and Best Practices

This article examines the unique challenges of monitoring containers, outlines three categories of metrics to collect, compares host‑centric and layered monitoring architectures, provides detailed methods for gathering CPU, memory, I/O and network data via cgroup files and Docker commands, and shares practical insights, tooling recommendations, and a Q&A session for effective container observability.

DockerMonitoringOps
0 likes · 18 min read
Container Monitoring: Challenges, Metrics Collection, and Best Practices
MaGe Linux Operations
MaGe Linux Operations
May 10, 2017 · Operations

Step‑by‑Step: Monitor Nginx and PHP‑FPM Status with Zabbix

This guide walks through configuring Zabbix to monitor Nginx and PHP‑FPM status, covering software installation paths, enabling status modules, creating extraction scripts, setting up Zabbix agent userparameters, restarting services, testing data retrieval, and adding server‑side templates for items, triggers, and graphs.

LinuxMonitoringNGINX
0 likes · 9 min read
Step‑by‑Step: Monitor Nginx and PHP‑FPM Status with Zabbix
MaGe Linux Operations
MaGe Linux Operations
Jan 5, 2017 · Operations

Mastering Puppet: Automate Server Deployment and Configuration

This article explains how Puppet automates large‑scale server provisioning by describing its architecture, workflow, manifest examples, class inheritance, and module structure, helping operations teams reduce manual effort and avoid errors in configuration management.

AutomationOpsPuppet
0 likes · 8 min read
Mastering Puppet: Automate Server Deployment and Configuration
MaGe Linux Operations
MaGe Linux Operations
Nov 14, 2016 · Operations

Master Ansible: From Basics to Advanced Automation without Agents

This comprehensive guide introduces Ansible, explains its agentless architecture, core components, installation, SSH key setup, inventory configuration, essential commands, and common modules, providing a practical roadmap for automating system administration and deployment tasks.

AnsibleAutomationOps
0 likes · 17 min read
Master Ansible: From Basics to Advanced Automation without Agents
Qunar Tech Salon
Qunar Tech Salon
Nov 8, 2016 · Operations

Building a Scalable Elasticsearch-as-a-Service Platform on Mesos, Marathon, and Docker at Qunar

This article describes how Qunar's operations team designed and implemented a cloud‑native Elasticsearch‑as‑a‑Service platform using Mesos, Marathon, and Docker, covering requirements analysis, technology selection, resource quota management, cluster isolation, service discovery, data reliability, monitoring, automated deployment, and future improvements.

DockerElasticsearchMarathon
0 likes · 17 min read
Building a Scalable Elasticsearch-as-a-Service Platform on Mesos, Marathon, and Docker at Qunar
21CTO
21CTO
Mar 16, 2016 · Backend Development

How Badoo Saved $1M by Migrating Hundreds of Servers to PHP 7

Badoo migrated its massive PHP codebase to PHP 7 across hundreds of servers, overcoming engine bugs, HHVM limitations, and extension incompatibilities, while revamping testing infrastructure and deployment processes, ultimately achieving up to 40% faster response times, eight‑fold memory reduction, and roughly one million dollars in cost savings.

Opsbackendmigration
0 likes · 22 min read
How Badoo Saved $1M by Migrating Hundreds of Servers to PHP 7
dbaplus Community
dbaplus Community
Jan 25, 2016 · Operations

Mastering Application Performance Diagnosis: Layered & Segment Approaches

This article outlines a comprehensive performance testing workflow, introduces layered and segment diagnostic methods, presents a detailed Apache/Tomcat/Linux/Oracle case study with LoadRunner and Nmon, and discusses monitoring metrics, analysis results, and practical recommendations for optimizing system performance.

LoadRunnerOpsapplication monitoring
0 likes · 14 min read
Mastering Application Performance Diagnosis: Layered & Segment Approaches
Efficient Ops
Efficient Ops
Dec 20, 2015 · Operations

What Makes a Truly Effective Ops Engineer and Architect?

This article outlines the essential skills, mindset, and learning ability required for a qualified operations engineer and details the four key competencies—communication, emergency response, continuous reflection, and strong learning—that define an outstanding ops architect.

ITILOpsengineering
0 likes · 9 min read
What Makes a Truly Effective Ops Engineer and Architect?
MaGe Linux Operations
MaGe Linux Operations
Jun 16, 2015 · Operations

Inside Dianping’s Ops: Building Scalable Monitoring, Automation, and Self‑Service Platforms

This article details how Dianping’s sub‑40‑person operations team structures its groups, designs a dual‑datacenter architecture, and creates comprehensive monitoring, automation, configuration, and analysis systems—including Zabbix, Cat, workflow, Button, and a custom radar platform—to achieve high‑availability, self‑service, and continuous improvement.

AutomationMonitoringOps
0 likes · 18 min read
Inside Dianping’s Ops: Building Scalable Monitoring, Automation, and Self‑Service Platforms