Operations 10 min read

How to Build a Scalable Automated Deployment System for Multi‑Node Clusters

This article walks through the shortcomings of manual code releases, designs a multi‑environment automated deployment workflow, details step‑by‑step implementation—including code fetching, configuration handling, logging, parallel execution, and rollback—while sharing practical scripts and common pitfalls for large‑scale clusters.

dbaplus Community
dbaplus Community
dbaplus Community
How to Build a Scalable Automated Deployment System for Multi‑Node Clusters

Background and Problems with Manual Deployment

Early releases relied on fully manual actions such as copying code with scp, logging into each node to run git pull or svn update, using xftp, and sending compressed packages for manual extraction. The drawbacks were heavy ops involvement, slow rollout on many nodes, frequent human errors, chaotic directory management, and delayed or difficult rollbacks.

Designing an Automated Deployment Solution

The design starts by defining five environments:

Development – shared services like MySQL, Redis, Memcached.

Testing – functional and performance testing.

Pre‑production – a single production node for integration testing without affecting live data.

Gray – region‑based split of the production environment.

Production – the live service for end users.

Pre‑production is introduced to avoid database inconsistencies and to test integration with production‑only interfaces such as payment APIs.

Key planning points include:

Code must already reside in a Git repository.

Enable one‑click deployment to ten cluster nodes with second‑level rollback.

All web services should run under a normal (non‑root) user.

Only the load balancer may listen on port 80.

Design a complete production‑grade automation system.

Implementation Steps

Place the source code in Git (or SVN) and ensure the repository is the single source of truth.

Fetch the latest code: obtain the target branch, version number, and any tag packages.

Resolve differences:

Node‑specific variations.

Configuration files that may live outside the code repo (e.g., config.example).

Sensitive data such as SMS or payment credentials should be hidden from developers.

Cluster‑wide differences (e.g., crontab.xml per node).

Adopt a naming convention for builds, e.g., project_env_version_branch_timestamp_author. Examples:

rainbow_test_v1.1.1_dev_2016-08-11_12:12_xuliangwei
rainbow_pro_v1.1.1_master_2016-08-11_11:11_xuliangwei

Update procedure varies by stack:

PHP – restart the service or clear opcache after deployment.

Java/Tomcat – restart and clean work and tmp directories.

Testing focuses on critical pages, APIs, and backend components, first in the pre‑production environment; if tests pass, continue, otherwise abort.

Log deployment statistics: successful runs, failures, and rollbacks.

Use a lock file to prevent concurrent executions that could cause duplicate deployments.

Choose serial execution for a few machines; switch to parallel (or grouped) execution when the node count grows.

Deploy a secondary deployment server to provide high availability and protect against loss of the primary deployment host.

Execution methods:

Shell scripts run directly.

Web UI or Jenkins jobs trigger the same scripts.

Rollback strategy relies on soft links for instant version switches; emergency rollback removes the current link and recreates it to the previous version.

Practical Deployment Workflow

Fetch the latest code.

Compile (optional).

Copy or link configuration files.

Package with tar for fast transfer.

Distribute files using scp, rsync, or salt without password prompts.

Temporarily remove the target server from the cluster (comment out its config).

Unpack the package.

Protect the webroot directory.

Transfer differential files if node‑specific configs differ.

Restart web services.

Run post‑deployment tests.

Rollback Practices

Normal rollback:

List available rollback versions.

Remove the target server from the cluster.

Execute the rollback (switch the soft link).

Restart services and verify.

Re‑add the server to the cluster.

Emergency rollback:

Identify the previous version (e.g., via ls -l or find).

Delete the current soft link and recreate it pointing to the older version.

Restart the affected service.

Common Pitfalls

Applying the automation to production without proper validation.

Inability to revert to the immediate previous version.

Issues with soft‑link based rollbacks.

For PHP, remember to restart or clear opcache when enabled.

For Java/Tomcat, always clean work and tmp directories before restart.

System Construction Example

Environment preparation using two Linux nodes (192.168.90.201 and 192.168.90.202). Steps include creating a normal user, configuring SSH keys, creating required directories, and setting up Nginx.

Commands:

useradd xuliangwei
passwd xuliangwei

Key configuration screenshots are shown below (images retained for reference):

Additional script analysis images illustrate the core logic of the deployment script (omitted here for brevity).

Extending the Solution

The script can be enhanced to support Git branches, tag‑based releases, or a custom web UI that invokes the same automation. Integration with open‑source CI tools such as Jenkins, along with quality gates (e.g., SonarQube), completes a full DevOps pipeline.

Future Updates

Planned improvements include tighter GitLab‑Jenkins integration, automated code quality checks before deployment, and expanded monitoring of deployment metrics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ci/cdDevOpsDeployment AutomationCluster ManagementrollbackShell scripting
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.