Operations 18 min read

How to Build a Practical Monitoring System for Small and Medium Enterprises

An in‑depth guide walks readers through building a comprehensive monitoring system for small‑to‑medium enterprises, covering hardware, system, application, network, security, traffic analysis, business metrics, log aggregation, automation, visualization, and practical integration with tools like Zabbix, IPMI, ELK, and Smokeping.

Efficient Ops
Efficient Ops
Efficient Ops
How to Build a Practical Monitoring System for Small and Medium Enterprises

This article, compiled from the "Efficient Operations" WeChat group talks, explains how to construct a relatively complete monitoring system for small and medium enterprises.

From Interview Start

During interviews, candidates are often asked how their previous companies handled monitoring. The author uses a fictional newcomer, Xiao Wang, to illustrate the process.

1. Define Goals and Align Mindset

The ultimate goal of monitoring is to ensure continuous and stable business operation. Before implementing any tool, one must understand the monitoring objects, their metrics, and alarm thresholds.

2. Story Begins

Xiao Wang, a fresh graduate, is tasked with setting up monitoring for an e‑commerce startup.

2.1 Hardware Monitoring

Basic hardware monitoring includes regular rack checks and using IPMI to collect temperature, disk health, etc. Xiao Wang wrote a simple script that queries ipmi and sends an email when temperature exceeds 50°C.

2.2 System Monitoring

Key system metrics are CPU, memory, and I/O. For CPU, monitor utilization, context switches, and run‑queue length (target run‑queue ≤ 3, user/system ratio ≈ 70/30). Common tools: top, vmstat, mpstat. Memory monitoring includes usage, swap, and detecting leaks. I/O monitoring covers disk usage, iowait, and network traffic using iostat, iotop, iftop.

2.3 Application Service Monitoring

Monitor services such as Apache (mod_status), Nginx (stub_status), Memcached (stats), Redis (info), and JVM (JMX). Scripts using grep, awk, or netcat retrieve status data. API endpoints are also checked via curl.

2.4 Introducing Zabbix

To avoid a proliferation of ad‑hoc scripts, Xiao Wang adopts Zabbix, consolidating all monitoring domains:

Hardware : Zabbix IPMI interface

System : Zabbix agent

Java : Zabbix JMX

Network devices : Zabbix SNMP

Application services : Zabbix user‑parameter

MySQL : percona‑monitoring‑plugins

URL : Zabbix web monitoring

Zabbix also provides auto‑discovery, proxy‑based distributed monitoring, and flexible alarm routing (email, WeChat, SMS, DingTalk).

2.5 Traffic Analysis

Beyond basic logs, Xiao Wang evaluates traffic using Google Analytics, Baidu Tongji, and the open‑source Piwik (Matomo) to obtain detailed visitor and conversion data.

2.6 Network Monitoring

For nationwide e‑commerce services, network health is tracked with Smokeping (Perl‑based, uses rrdtool) and commercial services for CDN status.

2.7 Security Monitoring

Layer‑7 protection is added via an Nginx+Lua WAF, with logs sent to Elasticsearch and visualized in Kibana. A Python crawler periodically scans GitHub for sensitive keywords.

2.8 Business Monitoring

Business‑level KPIs such as orders per minute, registrations, DAU, and SMS usage are added to Zabbix, with appropriate thresholds and alerts.

2.9 Log Monitoring

System, application, and service logs are centralized using the ELK stack (Logstash → Elasticsearch → Kibana). Errors trigger Zabbix alerts for rapid response.

2.10 Automation

Automation is achieved via Zabbix auto‑discovery (active) and Zabbix API calls (passive) tied to a CMDB, enabling automatic template assignment when new services appear.

2.11 Visualization

Effective dashboards combine traditional monitoring data with business analytics to quickly pinpoint the root cause of anomalies such as sudden drops in order volume.

Interview Ends

The author acknowledges that monitoring is an ongoing effort, with many additional topics like front‑end performance, code monitoring, and even public opinion monitoring.

FAQ

Is automatic fault remediation possible? Approaches vary; the author cites Tencent BlueKing as an example.

Should operations understand business? Yes, business awareness helps ops deliver value and troubleshoot user‑facing issues.

Is a CMDB necessary? A CMDB provides essential asset information for automated, reliable operations.

MonitoringAutomationoperationssystem monitoringlog managementZabbix
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.