How I Crashed OpenStack Five Times and Created a Lifesaving Deployment Guide
This comprehensive guide walks you through OpenStack deployment from a single‑node DevStack test to a production‑grade HA cluster with Kolla‑Ansible, covering hardware planning, component configuration, performance tuning, network setup, troubleshooting, monitoring, backup strategies, and useful operational scripts.
OpenStack Overview
Core services:
Nova – compute service for VM lifecycle.
Neutron – networking service for virtual networks, IP allocation and security groups.
Cinder – block‑storage service.
Glance – image service.
Keystone – identity service.
Horizon – web UI.
Single‑Node DevStack Installation
Prerequisites: Ubuntu 22.04 LTS, ≥4 CPU, 8 GB RAM, 50 GB disk, nested virtualization enabled if running inside a VM.
# Update system
sudo apt update && sudo apt upgrade -y
# Install dependencies
sudo apt install -y git python3-pip
# Create non‑root stack user
sudo useradd -s /bin/bash -d /opt/stack -m stack
echo "stack ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/stack
# Switch to stack user
sudo su - stack
# Clone DevStack
git clone https://opendev.org/openstack/devstack
cd devstack
# Create local.conf
cat > local.conf << 'EOF'
[[local|localrc]]
ADMIN_PASSWORD=supersecret
DATABASE_PASSWORD=$ADMIN_PASSWORD
RABBIT_PASSWORD=$ADMIN_PASSWORD
SERVICE_PASSWORD=$ADMIN_PASSWORD
[[local|localrc]]
GIT_BRANCH=stable/2024.1
enable_service tempest
EOF
# Run installation (30–60 min)
./stack.shOn completion, Horizon is reachable at http://<em>host_ip</em>/dashboard with user admin and the password set in ADMIN_PASSWORD.
Production HA Deployment with Kolla‑Ansible
Hardware Planning
Control nodes (3 ×): 8 CPU, 16 GB RAM each.
Compute nodes (N ×): 16 CPU, 32 GB RAM minimum.
Network nodes: ≥2 NICs.
Optional storage nodes for Ceph.
Install Kolla‑Ansible
# Install Kolla‑Ansible
pip3 install kolla-ansible
# Copy example configuration
cp -r /usr/share/kolla-ansible/etc_examples/kolla /etc/
cp /usr/share/kolla-ansible/ansible/inventory/* .
# globals.yml (excerpt)
cat > /etc/kolla/globals.yml << 'EOF'
---
kolla_base_distro: "ubuntu"
kolla_install_type: "source"
openstack_release: "2024.1"
network_interface: "eth0"
neutron_external_interface: "eth1"
kolla_internal_vip_address: "192.168.1.100"
enable_haproxy: "yes"
enable_keepalived: "yes"
enable_mariadb: "yes"
enable_rabbitmq: "yes"
enable_cinder: "yes"
enable_neutron: "yes"
enable_heat: "no"
EOFHost Inventory
# multinode inventory
[control]
ctrl1 ansible_host=192.168.1.10
ctrl2 ansible_host=192.168.1.11
ctrl3 ansible_host=192.168.1.12
[network]
ctrl1
ctrl2
ctrl3
[compute]
compute1 ansible_host=192.168.1.20
compute2 ansible_host=192.168.1.21
[storage]
compute1
compute2
[monitoring]
ctrl1
[deployment]
localhost ansible_connection=local
EOFDeploy
# Verify Ansible connectivity
kolla-ansible -i multinode bootstrap-servers
# Run pre‑checks
kolla-ansible -i multinode prechecks
# Pull Docker images (may be slow)
kolla-ansible -i multinode pull
# Deploy (30–60 min)
kolla-ansible -i multinode deploy
# Post‑deployment steps
kolla-ansible post-deploy
source /etc/kolla/admin-openrc.shVerify services with openstack commands; all should show enabled and UP.
Post‑Installation Tuning
MariaDB Optimization
# Edit /etc/kolla/config/mariadb.cnf
[mysqld]
innodb_buffer_pool_size = 8G
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 2
max_connections = 1000
thread_cache_size = 128
# Restart container
docker restart mariadbRabbitMQ Optimization
# Edit /etc/kolla/config/rabbitmq.conf
vm_memory_high_watermark.relative = 0.7
disk_free_limit.absolute = 5GB
cluster_partition_handling = autoheal
# Restart container
docker restart rabbitmqNeutron Network Configuration (OVN Backend)
# External network
openstack network create --external \
--provider-network-type flat \
--provider-physical-network provider ext-net
# External subnet
openstack subnet create --network ext-net \
--subnet-range 192.168.1.0/24 \
--gateway 192.168.1.1 \
--allocation-pool start=192.168.1.100,end=192.168.1.200 \
--no-dhcp ext-subnet
# Tenant (internal) network
openstack network create internal-net
openstack subnet create --network internal-net \
--subnet-range 10.0.0.0/24 \
--dns-nameserver 8.8.8.8 internal-subnet
# Router linking networks
openstack router create myrouter
openstack router set --external-gateway ext-net myrouter
openstack router add subnet myrouter internal-subnetWhen launching VMs, select internal-net and attach a floating IP for external access.
Fault Diagnosis and Log Monitoring
Log Locations
All Kolla‑Ansible logs reside under /var/log/kolla inside the respective containers.
# Nova logs
docker exec -it nova_api tail -f /var/log/kolla/nova/nova-api.log
# Neutron logs
docker exec -it neutron_server tail -f /var/log/kolla/neutron/neutron-server.log
# Container logs via journalctl
journalctl -u docker-kolla-nova_api -f
# Search for errors
grep -r "ERROR" /var/log/kolla/nova/ | tail -50Common Issues
VM launch failure – usually caused by insufficient compute resources or incompatible image format.
# Show hypervisor stats
openstack hypervisor stats show
# Inspect a compute node
openstack hypervisor show compute1
# Inspect VM details
openstack server show vm-01
# If status is ERROR, view console log
openstack console log show vm-01Network connectivity problems – check security‑group rules, router status, and DHCP agent.
# Security group rules
openstack security group rule list default
# Router details
openstack router show myrouter
# Network agents
openstack network agent list
# Restart DHCP agent if needed
docker restart neutron_dhcp_agentMonitoring and Alerts
Enable Prometheus and Grafana via globals.yml and redeploy.
# globals.yml additions
enable_prometheus: "yes"
enable_grafana: "yes"
# Redeploy monitoring services
kolla-ansible -i multinode deploy
# Grafana UI
http://192.168.1.100:3000 (admin / password from passwords.yml)Production Configuration Example
Hardware Specification
Control nodes (3 × Dell R740, 2 × Intel Gold 6248, 128 GB RAM, 2 × 480 GB SSD RAID1); Compute nodes (10 × Dell R740, 2 × Intel Gold 6242, 256 GB RAM, 2 × 960 GB SSD RAID1); Storage nodes (3 × Dell R740 with 12 × 8 TB SAS for Ceph); 10 GbE NICs with bonding mode 4.
Key globals.yml Parameters
---
kolla_base_distro: "ubuntu"
openstack_release: "2024.1"
kolla_internal_vip_address: "192.168.1.100"
kolla_external_vip_address: "10.0.0.100"
network_interface: "bond0"
neutron_external_interface: "bond1"
neutron_plugin_agent: "ovn"
enable_cinder: "yes"
enable_cinder_backend_ceph: "yes"
ceph_pool_name: "volumes"
enable_prometheus: "yes"
enable_grafana: "yes"
enable_alertmanager: "yes"
openstack_logging_debug: "False"
enable_fluentd: "yes"
fluentd_output_type: "elasticsearch"Backup Strategy
# Full daily DB backup + hourly incremental
docker exec mariadb mysqldump --all-databases \
--single-transaction --routines --triggers \
> /backup/mysql_$(date +%Y%m%d_%H%M%S).sql
# Backup configuration files
tar czf /backup/kolla_config_$(date +%Y%m%d).tar.gz /etc/kolla/
# Export Ceph volume
rbd export --pool volumes vol-001 /backup/vol-001_$(date +%Y%m%d).img
# Delete backups older than 30 days
find /backup -type f -mtime +30 -deleteDaily Operations Script
#!/bin/bash
source /etc/kolla/admin-openrc.sh
echo "===== OpenStack Inspection Report $(date) ====="
# Compute service status
openstack compute service list -f value -c Status | sort | uniq -c
echo "2. Network agents"
openstack network agent list -f value -c Alive | sort | uniq -c
echo "3. Resource usage"
openstack hypervisor stats show -f json | jq '{vcpus_used, memory_mb_used, local_gb_used}'
echo "4. VMs in ERROR"
openstack server list --status ERROR -f value -c Name
echo "5. Disk usage"
df -h /var/lib/docker | tail -1
echo "===== Inspection Completed ====="Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Agent Super App
AI agent applications, installation, large-model testing, computer fundamentals, IT operations and maintenance exchange, network technology exchange, Linux learning
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
