Server System Environment Baseline Management: Declarative Configuration, Multi‑OS Adaptation, Group Management, and Gray‑Release
The document proposes a declarative, multi‑OS baseline management platform that groups servers, supports gray‑release rollouts, monitors state, and automatically restores configurations, extending open‑source tools to provide versioned, conditional, and auditable system‑environment control across a large‑scale infrastructure.
This document presents a comprehensive design for managing server system‑environment baselines within a large‑scale infrastructure. It defines key concepts such as "system environment", "baseline", and "system‑environment management" (also referred to as baseline management).
The necessity of a systematic baseline approach is explained: shell scripts become unmanageable as server count grows, they are hard to verify, lack cross‑OS compatibility, and cannot support incremental updates, monitoring, or automated recovery.
Four main objectives are set: declarative configuration (state‑based), multi‑system adaptation, group‑based management to address diverse business needs, gray‑release capability for safe rollout, and monitoring with automatic baseline restoration.
After evaluating open‑source configuration‑management tools (SaltStack, Ansible, Puppet, Chef), the authors conclude that none fully satisfy the goals and therefore plan to extend an open‑source project with custom development.
Declarative configuration is advocated: users specify the desired final state (e.g., file location, permissions, package version) without worrying about implementation details. Example configuration items include file/directory handling, system packages, services, kernel parameters, and kernel modules, each described with their respective fields (source file, target path, auto‑create directory, permissions, etc.).
Post‑execution commands are introduced to handle actions that must run only when a change occurs (e.g., restarting sshd after its config file is modified). Execution conditions allow configurations to be applied selectively based on OS, kernel version, or data‑center specifics.
Configuration sets (collections of configuration items) are defined with ordering, conditional matching, and conflict detection. Conflicts are resolved by business owners selecting which items to keep. Configuration sets support versioning, enabling audit of changes and rollback capabilities.
Group management is described to overcome the coarse granularity of department‑level baseline control. A group is a collection of servers belonging to a specific business scenario; each group can be assigned a particular configuration‑set version. Servers store a copy of the group’s configuration version, and only after the group’s version is released does the server’s baseline change, ensuring controlled rollout.
The baseline change and maintenance process covers initialization (new server delivery, hardware changes, OS reinstall, business‑unit transfer), gray‑release procedures (sequential rollout across data centers and batches), and monitoring with optional automatic remediation. Periodic inspections compare the current system state with the baseline; if discrepancies are found, the system either reports them or restores the baseline depending on the “keep” setting.
The summary emphasizes that the platform now supports declarative configuration, multi‑OS adaptation, group management, gray‑release, and monitoring/keep functions, covering the company’s entire server fleet. Future work will focus on further optimization, expanding configuration capabilities, and extending monitoring/keep to all servers.
References include comparative studies of Ansible, Terraform, Puppet, Chef, SaltStack, Jinja templating, sysctl documentation, systemd, loadable kernel modules, and package managers.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.