How to Build a Scalable Automated Deployment System for Multi‑Node Clusters
This article walks through the shortcomings of manual code releases, designs a multi‑environment automated deployment workflow, details step‑by‑step implementation—including code fetching, configuration handling, logging, parallel execution, and rollback—while sharing practical scripts and common pitfalls for large‑scale clusters.
Background and Problems with Manual Deployment
Early releases relied on fully manual actions such as copying code with scp, logging into each node to run git pull or svn update, using xftp, and sending compressed packages for manual extraction. The drawbacks were heavy ops involvement, slow rollout on many nodes, frequent human errors, chaotic directory management, and delayed or difficult rollbacks.
Designing an Automated Deployment Solution
The design starts by defining five environments:
Development – shared services like MySQL, Redis, Memcached.
Testing – functional and performance testing.
Pre‑production – a single production node for integration testing without affecting live data.
Gray – region‑based split of the production environment.
Production – the live service for end users.
Pre‑production is introduced to avoid database inconsistencies and to test integration with production‑only interfaces such as payment APIs.
Key planning points include:
Code must already reside in a Git repository.
Enable one‑click deployment to ten cluster nodes with second‑level rollback.
All web services should run under a normal (non‑root) user.
Only the load balancer may listen on port 80.
Design a complete production‑grade automation system.
Implementation Steps
Place the source code in Git (or SVN) and ensure the repository is the single source of truth.
Fetch the latest code: obtain the target branch, version number, and any tag packages.
Resolve differences:
Node‑specific variations.
Configuration files that may live outside the code repo (e.g., config.example).
Sensitive data such as SMS or payment credentials should be hidden from developers.
Cluster‑wide differences (e.g., crontab.xml per node).
Adopt a naming convention for builds, e.g., project_env_version_branch_timestamp_author. Examples:
rainbow_test_v1.1.1_dev_2016-08-11_12:12_xuliangwei
rainbow_pro_v1.1.1_master_2016-08-11_11:11_xuliangweiUpdate procedure varies by stack:
PHP – restart the service or clear opcache after deployment.
Java/Tomcat – restart and clean work and tmp directories.
Testing focuses on critical pages, APIs, and backend components, first in the pre‑production environment; if tests pass, continue, otherwise abort.
Log deployment statistics: successful runs, failures, and rollbacks.
Use a lock file to prevent concurrent executions that could cause duplicate deployments.
Choose serial execution for a few machines; switch to parallel (or grouped) execution when the node count grows.
Deploy a secondary deployment server to provide high availability and protect against loss of the primary deployment host.
Execution methods:
Shell scripts run directly.
Web UI or Jenkins jobs trigger the same scripts.
Rollback strategy relies on soft links for instant version switches; emergency rollback removes the current link and recreates it to the previous version.
Practical Deployment Workflow
Fetch the latest code.
Compile (optional).
Copy or link configuration files.
Package with tar for fast transfer.
Distribute files using scp, rsync, or salt without password prompts.
Temporarily remove the target server from the cluster (comment out its config).
Unpack the package.
Protect the webroot directory.
Transfer differential files if node‑specific configs differ.
Restart web services.
Run post‑deployment tests.
Rollback Practices
Normal rollback:
List available rollback versions.
Remove the target server from the cluster.
Execute the rollback (switch the soft link).
Restart services and verify.
Re‑add the server to the cluster.
Emergency rollback:
Identify the previous version (e.g., via ls -l or find).
Delete the current soft link and recreate it pointing to the older version.
Restart the affected service.
Common Pitfalls
Applying the automation to production without proper validation.
Inability to revert to the immediate previous version.
Issues with soft‑link based rollbacks.
For PHP, remember to restart or clear opcache when enabled.
For Java/Tomcat, always clean work and tmp directories before restart.
System Construction Example
Environment preparation using two Linux nodes (192.168.90.201 and 192.168.90.202). Steps include creating a normal user, configuring SSH keys, creating required directories, and setting up Nginx.
Commands:
useradd xuliangwei
passwd xuliangweiKey configuration screenshots are shown below (images retained for reference):
Additional script analysis images illustrate the core logic of the deployment script (omitted here for brevity).
Extending the Solution
The script can be enhanced to support Git branches, tag‑based releases, or a custom web UI that invokes the same automation. Integration with open‑source CI tools such as Jenkins, along with quality gates (e.g., SonarQube), completes a full DevOps pipeline.
Future Updates
Planned improvements include tighter GitLab‑Jenkins integration, automated code quality checks before deployment, and expanded monitoring of deployment metrics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
