Operations 13 min read

Mastering IaaS Operations: The ARE Framework for Immutable Environments

This article distills a 2016 Global Operations Conference talk that introduces the Application Running Environment (ARE) concept, outlines IaaS's fundamental and immutable characteristics, examines multi‑team challenges, and proposes a stack‑based, democratic, aspect‑oriented management approach with concrete schemas, patterns, and operational models.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering IaaS Operations: The ARE Framework for Immutable Environments

IaaS Operations: Two Core Characteristics

IaaS is fundamentally low‑level, providing the underlying user application environment such as SDN/NFV networks and virtual machines over physical hosts, and it operates in a multi‑tenant context where a single fault can cause cascading effects; overall stability depends on the exponential interaction of many subsystems.

It must also be immutable. Experience shows that changes are the main cause of production incidents, so IaaS stability should stem from a deterministic, stable, and immutable runtime environment.

Introducing ARE (Application Running Environment)

Based on IaaS’s traits, the ARE concept was proposed to better understand and handle specific problems in IaaS.

Our product portfolio—cloud hosts, cloud databases (UDB), video cloud (ULive), cross‑region dedicated lines (UDPN), network zones—each maps to an internal set of ARE objects, including software versions, product types, deployment environments, and custom distributed requirements.

After a product is launched, its ARE should remain immutable except during initial deployment, code upgrades, rollbacks, or vulnerability patches.

From a software configuration management (SCM) perspective, ARE can be described as a configurable, tree‑structured, inheritable object, though rapid business iteration often makes this ideal difficult.

Understanding ARE as a Stack

We view ARE as a layered stack of objects, both static and dynamic, forming a composite collection.

Static set:

{os, os_ver, kernel, glibc, Python, PM2, JVM, framework}

Dynamic set:

{firmware, drivers, kernel modules, crontab, daemons, route tables, SDN flow tables, tunnels, *.so, application code}

Distributed environment:

{network topology, service modules, bandwidth characteristics, traffic patterns}

Challenges of ARE

Different teams focus on different aspects of the ARE: cloud‑host developers care about KVM/QEMU, while SDN ops care about flow tables. This multi‑team view leads to fragmented responsibility and makes it hard to define a single authoritative ARE object, complicating traditional configuration management.

For example, an ARE instance might consist of OS release, kernel version, OpenFlow version, framework, and application library, yielding a combinatorial space of thousands of possible configurations.

Product Team Requirements

Rapid iteration : The environment must support fast code releases, making ARE changes a normal occurrence.

Flexible management : Features like gray releases require easy modification of ARE components across many dimensions.

Operations Team Requirements

Unified configuration across the entire network, preferably in a tree‑structured, integrable form.

Immutability: once deployed, configurations should not change.

Business stability: ops are responsible for maintaining service reliability.

Managing ARE

Divide & Conquer : Assign each business unit responsibility for the stack segment it owns.

Democracy : Resolve conflicts (e.g., kernel parameters vs. SDN settings) through negotiation.

Aspect‑Oriented : Treat configuration management as a holistic structure that can be sliced by aspect.

Benefits

No need to pre‑define a single global configuration; focus shifts to local consistency.

Each department can self‑manage its ARE subset, defining, submitting, checking, and applying changes.

Implementing ARE Management

We introduced two concepts:

ARE Schema : The complete set of objects defined by ops or system teams.

ARE Pattern : A subset selected by a business unit to form a concrete configuration object.

Example: A vulnerability discovered on March 1 required an urgent fix across thousands of machines. Using an ARE pattern, we queried the affected set, generated a target list, and applied the change through a unified job system.

ARE Orchestration

ARE itself becomes the orchestration target, guiding deployment, code rollout, rollback, and vulnerability remediation.

Cloud Operations Model and Standards

We share an internal cloud‑ops model that emphasizes six‑sigma availability, minimizing human error, and standardizing processes.

AIOW Model

A – Action

I – Input : Preconditions and inputs.

O – Output : Expected results and verification steps.

W – Who/Where/What/How : Detailed execution context.

uAppo Operations Guidelines

P – Priority : Fault handling follows notify → recover → preserve evidence → root‑cause analysis; routine ops follow established process → mentor → supervisor.

B – Backup : Always back up before any operation.

C – Control : Scope, result, and content of ops must be controllable.

T – Time : Every task has a time attribute; adhere to SLA or provide estimates.

O – Object : Name every object (tools, commands, files, databases) clearly; avoid vague references; use a unified naming standard (uAppo‑x).

Cloud ComputingoperationsConfiguration ManagementIaaSimmutable-infrastructure
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.