Cloud Native 16 min read

Vipshop PaaS on Kubernetes: Architecture, CI/CD Pipeline, Networking, Logging, and Lessons Learned

The article details Vipshop's two‑year PaaS implementation on Kubernetes, covering the CI/CD pipeline, custom networking, Docker registry enhancements, logging and monitoring solutions, as well as the operational challenges and fixes encountered during the migration.

DevOps
DevOps
DevOps
Vipshop PaaS on Kubernetes: Architecture, CI/CD Pipeline, Networking, Logging, and Lessons Learned

Vipshop's PaaS team, led by senior engineer Wang Chengchang, shares the practical experience of running a PaaS platform on Kubernetes for two years, focusing on continuous integration, continuous deployment, networking, logging, and monitoring.

The platform defines a multi‑phase Jenkins Pipeline executed on Kubernetes pods via the Jenkins k8s plugin; phases include source checkout, build, unit tests, image baking, deployment, and integration testing, with artifacts stored in the internal Cider package system and a Docker registry.

Deployment triggers integration tests and, upon success, marks packages as usable, progressing from test to staging environments with an approval workflow before production release, all managed by a unified UI dashboard backed by Nginx, Tomcat, CPMS, and an API server.

Networking originally used Flannel but switched to Contiv with a custom kube‑HAProxy replacing kube‑proxy, and a kube‑sky component to map internal service names to company‑specific domains, enabling fixed Pod IPs for whitelist requirements.

The Docker registry was extended to integrate CAS/OAuth authentication, trigger deployments on image pushes, and index tags in a database; image security scanning is performed by Clair.

Logging is collected via fluentd + ELK, enriched with Kubernetes metadata, and forwarded to Kafka and Elasticsearch; custom Kibana dashboards and ElastAlert rules provide UI‑driven alerting.

Monitoring combines cAdvisor for per‑Pod metrics, Prometheus plugins for Swarm nodes, and Grafana dashboards for namespace‑wide views.

Operational issues encountered include Devicemapper performance, stuck Pods, excessive dead containers, slow ResourceQuota updates, batch job restarts, Skydns IP mismatches, OverlayFS move failures, disk‑space exhaustion, and namespaces stuck in terminating state, each addressed with specific fixes or upgrades.

The presentation concludes with a summary of these challenges and the solutions implemented over the two‑year period.

monitoringCI/CDKubernetesdevopsContainerloggingPaaS
DevOps
Written by

DevOps

Share premium content and events on trends, applications, and practices in development efficiency, AI and related technologies. The IDCF International DevOps Coach Federation trains end‑to‑end development‑efficiency talent, linking high‑performance organizations and individuals to achieve excellence.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.