Mastering OpenStack Monitoring: Key Metrics and Best Practices
This article explains what OpenStack is, outlines its core modules, and details the most important monitoring metrics for Nova, Neutron, Keystone, hypervisors, tenants, and RabbitMQ, helping engineers build a robust, scalable OpenStack monitoring solution.
What is OpenStack
OpenStack is an IaaS software jointly developed by NASA and Rackspace that enables anyone to build and provide cloud computing services, including private clouds within firewalls for enterprises.
OpenStack Module Composition
OpenStack consists of five core modules:
Nova – Compute service
Keystone – Identity (authentication) service
Glance – Image service
Neutron – Virtual networking service
Cinder – Block storage service
Horizon – UI component
Nova
Nova provides instance lifecycle management, compute resource management, network and authorization management, a RESTful API, asynchronous communication, and supports various hypervisors such as Xen, KVM, VMware vSphere, and Hyper‑V.
Key Nova metrics for monitoring include:
openstack.nova.current_workload – current workload (build, snapshot, migration, resize, etc.)
openstack.nova.running_vms – number of running VMs
openstack.nova.hypervisor_load.1 – hypervisor load, disk, RAM, CPU metrics
openstack.nova.limits.max_personality – project‑related limits
Nova communicates via AMQP using RabbitMQ, enabling asynchronous callbacks that keep API calls non‑blocking.
Neutron & Keystone
Neutron provides virtual network management, simplifying network configuration. Keystone offers authentication and access‑policy services for all OpenStack components, using a REST‑based Identity API.
Important Monitoring Metrics
Monitoring should focus on four categories:
Hypervisor metrics – VM count, hypervisor load, etc.
Nova Server metrics – disk I/O, RAM usage, etc.
Tenant metrics – resource usage per tenant, CPU cores, instance count
Message Queue metrics – RabbitMQ queue size and performance
Hypervisor Metrics
Key hypervisor metrics include:
hypervisor_load – system load over the past minute, similar to OS load average
current_workload – number of active tasks (build, snapshot, migrate, resize)
running_vms – total running VMs
vcpus_available – available CPU cores (useful for capacity planning)
free_disk_gb – free disk space, affecting VM creation
free_ram_mb – free RAM, a critical resource metric
Nova Server Metrics
Monitoring Nova Server metrics helps detect issues such as the “Noisy Neighbor” problem. Metrics like hdd_read_req indicate VM performance and can trigger investigations when spikes occur.
Tenant Metrics
Tenant metrics reflect business‑related resource consumption. Monitoring total_cores_used, max_total_cores, total_instances_used, and max_total_instances helps allocate resources efficiently across different user groups.
RabbitMQ Metrics
RabbitMQ, the message queue used by OpenStack, provides several important metrics:
consumer_utilisation – ideal value is 100%; lower values indicate processing delays
memory – high memory usage can trigger disk paging and throttling
count – number of queues; a count of 0 should raise an alarm
consumers – number of active consumers; zero indicates a serious issue
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.