Cloud Computing 16 min read

How I Crashed OpenStack Five Times and Created a Lifesaving Deployment Guide

This comprehensive guide walks you through OpenStack deployment from a single‑node DevStack test to a production‑grade HA cluster with Kolla‑Ansible, covering hardware planning, component configuration, performance tuning, network setup, troubleshooting, monitoring, backup strategies, and useful operational scripts.

AI Agent Super App
AI Agent Super App
AI Agent Super App
How I Crashed OpenStack Five Times and Created a Lifesaving Deployment Guide

OpenStack Overview

Core services:

Nova – compute service for VM lifecycle.

Neutron – networking service for virtual networks, IP allocation and security groups.

Cinder – block‑storage service.

Glance – image service.

Keystone – identity service.

Horizon – web UI.

Single‑Node DevStack Installation

Prerequisites: Ubuntu 22.04 LTS, ≥4 CPU, 8 GB RAM, 50 GB disk, nested virtualization enabled if running inside a VM.

# Update system
sudo apt update && sudo apt upgrade -y
# Install dependencies
sudo apt install -y git python3-pip
# Create non‑root stack user
sudo useradd -s /bin/bash -d /opt/stack -m stack
echo "stack ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/stack

# Switch to stack user
sudo su - stack
# Clone DevStack
git clone https://opendev.org/openstack/devstack
cd devstack
# Create local.conf
cat > local.conf << 'EOF'
[[local|localrc]]
ADMIN_PASSWORD=supersecret
DATABASE_PASSWORD=$ADMIN_PASSWORD
RABBIT_PASSWORD=$ADMIN_PASSWORD
SERVICE_PASSWORD=$ADMIN_PASSWORD
[[local|localrc]]
GIT_BRANCH=stable/2024.1
enable_service tempest
EOF

# Run installation (30–60 min)
./stack.sh

On completion, Horizon is reachable at http://<em>host_ip</em>/dashboard with user admin and the password set in ADMIN_PASSWORD.

Production HA Deployment with Kolla‑Ansible

Hardware Planning

Control nodes (3 ×): 8 CPU, 16 GB RAM each.

Compute nodes (N ×): 16 CPU, 32 GB RAM minimum.

Network nodes: ≥2 NICs.

Optional storage nodes for Ceph.

Install Kolla‑Ansible

# Install Kolla‑Ansible
pip3 install kolla-ansible
# Copy example configuration
cp -r /usr/share/kolla-ansible/etc_examples/kolla /etc/
cp /usr/share/kolla-ansible/ansible/inventory/* .
# globals.yml (excerpt)
cat > /etc/kolla/globals.yml << 'EOF'
---
kolla_base_distro: "ubuntu"
kolla_install_type: "source"
openstack_release: "2024.1"
network_interface: "eth0"
neutron_external_interface: "eth1"
kolla_internal_vip_address: "192.168.1.100"
enable_haproxy: "yes"
enable_keepalived: "yes"
enable_mariadb: "yes"
enable_rabbitmq: "yes"
enable_cinder: "yes"
enable_neutron: "yes"
enable_heat: "no"
EOF

Host Inventory

# multinode inventory
[control]
ctrl1 ansible_host=192.168.1.10
ctrl2 ansible_host=192.168.1.11
ctrl3 ansible_host=192.168.1.12
[network]
ctrl1
ctrl2
ctrl3
[compute]
compute1 ansible_host=192.168.1.20
compute2 ansible_host=192.168.1.21
[storage]
compute1
compute2
[monitoring]
ctrl1
[deployment]
localhost ansible_connection=local
EOF

Deploy

# Verify Ansible connectivity
kolla-ansible -i multinode bootstrap-servers
# Run pre‑checks
kolla-ansible -i multinode prechecks
# Pull Docker images (may be slow)
kolla-ansible -i multinode pull
# Deploy (30–60 min)
kolla-ansible -i multinode deploy
# Post‑deployment steps
kolla-ansible post-deploy
source /etc/kolla/admin-openrc.sh

Verify services with openstack commands; all should show enabled and UP.

Post‑Installation Tuning

MariaDB Optimization

# Edit /etc/kolla/config/mariadb.cnf
[mysqld]
innodb_buffer_pool_size = 8G
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 2
max_connections = 1000
thread_cache_size = 128
# Restart container
docker restart mariadb

RabbitMQ Optimization

# Edit /etc/kolla/config/rabbitmq.conf
vm_memory_high_watermark.relative = 0.7
disk_free_limit.absolute = 5GB
cluster_partition_handling = autoheal
# Restart container
docker restart rabbitmq

Neutron Network Configuration (OVN Backend)

# External network
openstack network create --external \
  --provider-network-type flat \
  --provider-physical-network provider ext-net
# External subnet
openstack subnet create --network ext-net \
  --subnet-range 192.168.1.0/24 \
  --gateway 192.168.1.1 \
  --allocation-pool start=192.168.1.100,end=192.168.1.200 \
  --no-dhcp ext-subnet
# Tenant (internal) network
openstack network create internal-net
openstack subnet create --network internal-net \
  --subnet-range 10.0.0.0/24 \
  --dns-nameserver 8.8.8.8 internal-subnet
# Router linking networks
openstack router create myrouter
openstack router set --external-gateway ext-net myrouter
openstack router add subnet myrouter internal-subnet

When launching VMs, select internal-net and attach a floating IP for external access.

Fault Diagnosis and Log Monitoring

Log Locations

All Kolla‑Ansible logs reside under /var/log/kolla inside the respective containers.

# Nova logs
docker exec -it nova_api tail -f /var/log/kolla/nova/nova-api.log
# Neutron logs
docker exec -it neutron_server tail -f /var/log/kolla/neutron/neutron-server.log
# Container logs via journalctl
journalctl -u docker-kolla-nova_api -f
# Search for errors
grep -r "ERROR" /var/log/kolla/nova/ | tail -50

Common Issues

VM launch failure – usually caused by insufficient compute resources or incompatible image format.

# Show hypervisor stats
openstack hypervisor stats show
# Inspect a compute node
openstack hypervisor show compute1
# Inspect VM details
openstack server show vm-01
# If status is ERROR, view console log
openstack console log show vm-01

Network connectivity problems – check security‑group rules, router status, and DHCP agent.

# Security group rules
openstack security group rule list default
# Router details
openstack router show myrouter
# Network agents
openstack network agent list
# Restart DHCP agent if needed
docker restart neutron_dhcp_agent

Monitoring and Alerts

Enable Prometheus and Grafana via globals.yml and redeploy.

# globals.yml additions
enable_prometheus: "yes"
enable_grafana: "yes"
# Redeploy monitoring services
kolla-ansible -i multinode deploy
# Grafana UI
http://192.168.1.100:3000  (admin / password from passwords.yml)

Production Configuration Example

Hardware Specification

Control nodes (3 × Dell R740, 2 × Intel Gold 6248, 128 GB RAM, 2 × 480 GB SSD RAID1); Compute nodes (10 × Dell R740, 2 × Intel Gold 6242, 256 GB RAM, 2 × 960 GB SSD RAID1); Storage nodes (3 × Dell R740 with 12 × 8 TB SAS for Ceph); 10 GbE NICs with bonding mode 4.

Key globals.yml Parameters

---
kolla_base_distro: "ubuntu"
openstack_release: "2024.1"
kolla_internal_vip_address: "192.168.1.100"
kolla_external_vip_address: "10.0.0.100"
network_interface: "bond0"
neutron_external_interface: "bond1"
neutron_plugin_agent: "ovn"
enable_cinder: "yes"
enable_cinder_backend_ceph: "yes"
ceph_pool_name: "volumes"
enable_prometheus: "yes"
enable_grafana: "yes"
enable_alertmanager: "yes"
openstack_logging_debug: "False"
enable_fluentd: "yes"
fluentd_output_type: "elasticsearch"

Backup Strategy

# Full daily DB backup + hourly incremental
docker exec mariadb mysqldump --all-databases \
  --single-transaction --routines --triggers \
  > /backup/mysql_$(date +%Y%m%d_%H%M%S).sql
# Backup configuration files
tar czf /backup/kolla_config_$(date +%Y%m%d).tar.gz /etc/kolla/
# Export Ceph volume
rbd export --pool volumes vol-001 /backup/vol-001_$(date +%Y%m%d).img
# Delete backups older than 30 days
find /backup -type f -mtime +30 -delete

Daily Operations Script

#!/bin/bash
source /etc/kolla/admin-openrc.sh
echo "===== OpenStack Inspection Report $(date) ====="
# Compute service status
openstack compute service list -f value -c Status | sort | uniq -c
echo "2. Network agents"
openstack network agent list -f value -c Alive | sort | uniq -c
echo "3. Resource usage"
openstack hypervisor stats show -f json | jq '{vcpus_used, memory_mb_used, local_gb_used}'
echo "4. VMs in ERROR"
openstack server list --status ERROR -f value -c Name
echo "5. Disk usage"
df -h /var/lib/docker | tail -1
echo "===== Inspection Completed ====="
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringtroubleshootingbackupOpenStackHADevStackKolla-Ansible
AI Agent Super App
Written by

AI Agent Super App

AI agent applications, installation, large-model testing, computer fundamentals, IT operations and maintenance exchange, network technology exchange, Linux learning

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.