Cloud Native 13 min read

Cloud-Native Environment Management Platform for Document Library: Architecture, Implementation and Benefits

To overcome unsynchronized development, slow deployments, and fragmented testing after moving its document library to the cloud, the team built a Kubernetes‑driven environment‑management platform that automatically provisions containerized sub‑environments on demand, scaling to thousands, cutting costs, boosting security, and enabling rapid, parallel testing for hundreds of projects.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Cloud-Native Environment Management Platform for Document Library: Architecture, Implementation and Benefits

In the cloud-native era, software iteration speed increases, raising demands on development and testing. After moving the document library's overall architecture to the cloud, the associated testing process changed, exposing many problems.

Problems identified include: unsynchronized development environment status, low efficiency of email-based progress sync; low deployment efficiency due to complex environment construction, difficult environment switching, missing documentation; difficulties deploying private services due to inconsistent environments, hard-to-trace exceptions, complex deployment, missing documentation; test environment deployment difficulties due to shared environments, poor stability; low functional test coverage due to missing test data; missing test cases; missing performance testing processes and standards.

To address these issues, the document library built an environment management platform using container orchestration technology, splitting online services into multiple subenvironments. Developers and QA can request needed environment packages on demand, and the platform quickly outputs standardized container sets meeting test needs.

The platform efficiently supports rapid business iteration, accommodating over 500 subenvironments simultaneously, enabling parallel testing for at least 100 iterative projects. Environment creation numbers grew from 2,073 in 2019 to 8,908 in 2020, 16,801 in 2021, and over 5,000 by 2022. Based on the platform, automated testing build standards were established, solving past quality issues due to lack of testing standards.

The platform's technical architecture embraces cloud-native principles, built on Kubernetes container orchestration and Docker engine, combined with open-source projects like git, flanneld, etcd, and internal services (bcc, icode, bns, agile, uuap). It provides benefits such as version‑controlled image builds, image‑repository based environment delivery, container‑based runtime environments, and declarative cluster management via K8s.

Core implementation consists of a master node, an independent etcd cluster, 22 internal network node nodes, and a private image registry. Automation scripts define deployment rules; the highly available etcd cluster stores distributed system data; the flannel network plugin divides the internal network, assigning each container a unique cluster‑wide virtual IP, creating virtual interface flannel0 and docker bridge data, building routing tables for packet send/receive, and configuring iptables rules for secure traffic scheduling, forming a second‑level, dynamically deliverable, secure and reliable container orchestration cluster.

Advantages include: cost reduction and efficiency improvement (DevOps new mode, minute‑level environment building, scaling automated construction, lowering O&M manpower); security and controllability (stable, flexible, controllable pipeline builds and network communication); simplicity and ease of use (no direct cloud resource management, single‑page operations, orchestration handled by the automated cluster).

The platform’s lifecycle covers code & image daily updates, environment generation to recovery, and subenvironment topology. Daily updates sync online/online environments via standard shell and Docker commands. Environment lifecycle: application → approval → generation (kubectl create -f xxx.yaml) → build (init env, update code, start) → proxy forwarding (nginx vhost) → subenvironment linking (port mapping) → recovery (release or expiry). Subenvironment granularity matches online service units, including wap frontend, pc frontend, paid frontend, basic backend, shop backend, user backend, document backend, paid backend services, with public services (user center, storage, download, search, transcoding) decoupled.

Through subenvironment splitting, the platform achieves interference‑free testing while avoiding massive resource waste, supporting collaborative environment sharing, pipeline integration, middleware hardening, etc., and continues to evolve with the business shift from PHP to Go, increasing microservice count, and expanding QA demands.

cloud nativeCI/CDplatform engineeringKubernetesDevOpsenvironment-managementContainer Orchestration
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.