Automating Cloud Infrastructure at Liulishuo: Deployment, Management, and Governance Practices
The article describes Liulishuo's Cloud Infra team's end‑to‑end automation of cloud resource provisioning, scaling, and cost governance using Terraform, a custom Luban platform, GitLab CI/CD, and chat‑bot integrations, highlighting the architectural design, implementation steps, and measurable benefits for both operations and business teams.
Liulishuo, a technology‑driven education company, built an AI‑powered platform and needed a more agile cloud infrastructure to support rapid product iteration. The Cloud Infra team was tasked with accelerating resource supply, reducing overall cost, and improving operational efficiency.
Automation Principles : 1) Automate repetitive tasks to foster an automation culture; 2) Prioritize automation based on business value.
The team approached automation from three dimensions: deployment automation, management automation, and governance automation.
Deployment Automation: Resource Supply Automation
Before automation, resource creation was manual, leading to inconsistent management, error‑prone changes, and duplicated effort. By integrating Terraform, a self‑developed Luban platform, and GitLab, the team achieved fully automated resource provisioning, reducing supply time from hours to minutes and doubling operational efficiency.
Key components:
Luban provides a front‑end request portal that generates Terraform code based on selected parameters.
GitLab stores the IaC configuration as a unified code repository.
Terraform executes creation and modification of Alibaba Cloud resources.
The detailed workflow includes:
Submit a resource request via Luban; the system generates Terraform files automatically.
Mobius webhook triggers a terraform plan pipeline, logs the plan, and notifies reviewers.
Tech leaders review the plan and approve it in GitLab.
After approval, Mobius triggers terraform apply , merges the code, and sends a success notification via Luban Bot.
Example Terraform configuration (generated by Luban):
resource "alicloud_instance" "instance" {
availability_zone = "cn-beijing-b"
security_groups = alicloud_security_group.group.*.id
instance_type = "ecs.n4.large"
system_disk_category = "cloud_efficiency"
system_disk_name = "test_foo_system_disk_name"
system_disk_description = "test_foo_system_disk_description"
image_id = "ubuntu_18_04_64_20G_alibase_20190624.vhd"
instance_name = "test_foo"
vswitch_id = alicloud_vswitch.vswitch.id
internet_max_bandwidth_out = 10
data_disks {
name = "disk2"
size = 20
category = "cloud_efficiency"
description = "disk2"
encrypted = true
kms_key_id = alicloud_kms_key.key.id
}
}Management Automation: Elastic Scaling
To handle fluctuating user traffic and promotional events, the team implemented automatic scaling at the container, server, and database layers, achieving over 20% cost savings during peak loads.
Container layer: Horizontal Pod Autoscaler (HPA) driven by custom metrics, business metrics, and scheduled tasks.
Server layer: Elastic scaling groups that adjust ECS instances based on server‑level metrics.
Database layer: Cloud‑native databases (e.g., EMR) that scale ECS instances according to CPU/Memory usage.
Governance Automation: Cost Management
The team built a Catalog system linking resources to applications, owners, and teams, enabling precise cost allocation using resource tags and labels. Automated weekly cost reports aggregate CPU, memory, and IOPS utilization, providing real‑time dashboards and detailed monthly analyses without impacting production stability.
Overall, the automation framework has dramatically improved delivery speed, transparency of cloud spending, and operational maturity for Liulishuo's Cloud Infra team.
Liulishuo Tech Team
Help everyone become a global citizen!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.