Industry Insights 12 min read

How Youzan Scaled Development with Containerization: Challenges and Solutions

This article examines Youzan's journey to containerize its development and testing environments using Kubernetes and Docker, detailing the motivations, architectural decisions, network and isolation challenges, image integration, logging, load balancing, debugging, and the ongoing rollout to standard production environments.

Youzan Coder
Youzan Coder
Youzan Coder
How Youzan Scaled Development with Containerization: Challenges and Solutions

Introduction

Containerization was adopted to accelerate delivery of development and testing environments and to address resource contention among parallel projects.

Motivation

Each project required isolated daily (development) and QA environments that could be created and destroyed along the project lifecycle, enabling rapid environment provisioning.

Solution Overview

The platform runs on Kubernetes 1.7.10 with Docker 1.12.6/1.13.1. The following sections describe the main technical challenges and the applied solutions.

Network

Backend services are Java applications using a custom Dubbo framework. Full containerization was not possible, so network interoperability with existing clusters was required. Overlay networking on public clouds proved unreliable, so a macvlan‑based hosted network was used, providing direct L2 connectivity without performance loss. Later multi‑cloud support added overlay and VPC networking to regain elasticity.

Isolation

Containers use kernel namespaces and cgroups, but /proc still reports the host’s physical CPU count and memory size, causing inaccurate resource visibility inside containers.

Memory Issue

Java applications adjust JVM parameters based on detected memory. The team mitigated the mis‑reporting by mounting lxcfs, which virtualizes /proc/meminfo for containers.

CPU Count Issue

Kubernetes default CPU‑share limits and over‑commit policies left the reported CPU count incorrect even with lxcfs. JVMs and many Java SDKs base thread‑pool sizes on the reported CPU count, leading to excessive threads and memory usage. The solution introduced an environment variable NUM_CPUS and, for Java, preloaded a library via LD_PRELOAD that overrides ActiveProcessorCount to return NUM_CPUS.

Application Integration

All services were already integrated with an internal release system, so container adoption required minimal changes. No Dockerfiles were needed from business teams.

Node.js, Python, and PHP‑SOA applications managed by supervisord only need an app.yaml in the Git repository to declare the runtime and start command.

Standardized Java applications run unchanged.

Non‑standard Java applications must be refactored to follow the standard launch model.

Image Integration

Images are built in three layers: stack (OS), runtime (language environment), and application (business code plus auxiliary agents). Initially each environment built its own image, but pod startup order constraints led to packing all services of a pod into a single container.

Image construction is orchestrated by Kubernetes: a packaging pod compiles code, installs dependencies, generates a Dockerfile, and runs Docker‑in‑Docker to build and push the image. PersistentVolumeClaims cache Python virtualenvs, Node.js node_modules, and Maven repositories to speed up builds. Newer Docker CE versions are used to leverage ADD --chown, avoiding extra layers for file ownership changes.

Load Balancing (Ingress)

The organization already operates a self‑developed service mesh and a unified access system. Instead of a full Ingress controller, a sync program watches the Kubernetes API for Service changes and updates the upstream list in the unified access system, handling external HTTP traffic.

Container Login and Debugging

Because console access was cumbersome, SSH access was enabled for project and continuous‑delivery environments that require frequent debugging. A special debug‑release mode disables health checks, allowing developers to inspect failing pods.

Logging

Logs are collected by an internal system called “Tianwang”. Container stdout is treated as supplemental. Fluentd gathers the output, formats it according to Tianwang’s schema, forwards it to Kafka, and finally indexes it in Elasticsearch.

Canary Release

Canary traffic includes user‑side HTTP requests, inter‑service HTTP calls, and Dubbo calls. Labels (e.g., user, shop) are attached at the unified entry point and propagated through HTTP and Dubbo clients. A dedicated canary deployment is created, and the canary configuration center applies routing rules so downstream services respect the canary logic.

Standard Environment Containerization

Rationale

Daily, QA, pre‑release, and production environments often run on under‑utilized servers, wasting resources.

Running these environments on single VMs makes simultaneous releases risky.

VM provisioning is slower, and using VMs for canary releases adds complexity.

Long‑lived VMs create challenges for OS and software version convergence.

Progress

After containerizing project and continuous‑delivery environments, most applications are ready for production containerization. The operational stack (monitoring, release, logging, etc.) is being adapted. Production rollout has started with several front‑end Node.js services, and migration of additional services is ongoing.

Conclusion

The containerization effort improved environment delivery speed, resource utilization, and cost efficiency, while exposing challenges in networking, isolation, image management, and debugging. Production rollout is in early stages, and further experience will be shared.

References

https://github.com/fabianenardon/docker-java-issues-demo

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerOperationsKubernetesDevOpscontainerizationEnvironment provisioning
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.