Why a Real‑Estate Platform Chose Docker Swarm Over Kubernetes and What We Learned
This article details how Fangduoduo, a leading mobile real‑estate platform, migrated to a Docker Swarm‑based container cloud, covering the reasons for choosing Swarm, architecture design, networking, CI/CD integration, performance optimizations, monitoring, and the operational benefits gained from the transition.
Background and Significance
Fangduoduo is China’s first mobile internet real‑estate trading platform, providing an efficient O2O marketplace for developers, agencies, and buyers. Rapid business growth and micro‑service complexity required continuous innovation, fast delivery, and reliable operations, but the service‑oriented architecture introduced many operational challenges.
Key pain points included manual VM scheduling, high cost of test/pre‑release/production environments, reliance on single physical machines, proliferation of servers due to fast‑changing requirements, and complex CI/CD scripts.
Selection
Although Kubernetes dominates container orchestration, Fangduoduo chose Docker Swarm for its flexibility, performance, and ease of use. The decision was based on three criteria:
Performance: Swarm could start a container in under a second, outperforming Kubernetes in their tests. Ease of Use: Simple API and low learning curve, with minimal operational overhead. Flexibility: Seamless integration with existing DevOps workflows and easy migration path to Kubernetes if needed.
Performance tests showed Swarm launching containers in about one second under 50% load across 1,000 nodes.
Container Cloud Architecture
Swarm’s architecture mirrors Kubernetes but remains simpler. Three low‑spec XenServer VMs act as manager nodes (Raft‑based fault tolerance) while physical machines serve as workers for optimal performance. Portainer provides a UI and API similar to Kubernetes’ apiserver.
Services (a hybrid of Kubernetes Services and Deployments) manage scheduling, scaling, and port mapping, enabling any container IP to be reachable.
External traffic is routed through Nginx reverse proxy to the Swarm routing mesh, which uses built‑in DNS and LVS load balancing to forward requests to service replicas, even if the target node lacks the container. Internal traffic uses an Envoy Mesh gateway with a custom xDS service for overlay or macvlan networking.
Network
Two networking challenges were addressed:
Container‑to‑container communication via an overlay network, providing each container with an eth0 interface that traverses the host’s docker0 bridge.
Container‑to‑external communication using macvlan, which creates sub‑interfaces on the physical NIC, exposing containers directly to the physical network.
Network setup commands:
<code>docker network create --config-only --subnet=10.0.127.0/20 --gateway=10.0.112.1 --ip-range=10.0.112.0/25 -o parent=bond0 pub_net_config</code> <code>docker network create -d macvlan --scope swarm --config-from pub_net_config pub_net</code> <code>docker service update m-web --network-add pub_net</code>Containerization
Business code is built into images; Node services were containerized first due to their stateless nature. Java services use Maven/Gradle with Google’s Jib plugin to build images directly from build files.
Optimization and Tips
Image Size
Base images were switched to Alpine 3.9 with selected libraries, reducing image size by 80% to under 300 MB.
Signal Handling
Using tini as the entrypoint eliminates zombie shells and allows graceful container stops.
Java Container Settings
JDK 8 with -XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 ensures proper cgroup memory allocation, reserving 25% for stack memory.
DNS Performance
Docker’s DNS concurrency limit was increased from 100 to 1024, and a patch to musl’s DNS resolver prevented long‑lasting IPv6‑only lookups.
Slab Allocation
Kernel slab allocation failures under high load were resolved by upgrading to CentOS 7.5 kernel 862.
Physical Machine Network Issues
Ensuring ipv4_forward is enabled and pre‑flight ping checks for DNS and network connectivity prevent overlay network failures.
Release System Refactoring
Jenkins pipelines were introduced to codify build processes, allowing per‑project customization while generating a unified Jenkinsfile for each build.
A web‑based Docker terminal forwards docker exec commands to the appropriate host, providing syntax highlighting and shortcuts.
Log viewing fetches the latest 2,000 lines via docker logs , with auto‑refresh and scrolling.
Log Collection
Containers write logs to host‑mounted volumes; Filebeat in global mode ships them to an Elasticsearch cluster. Grafana Loki is being evaluated as a lightweight alternative that indexes only necessary fields.
Monitoring
Swarm‑Prometheus (swarmprom) monitors host and container metrics. Additional machine‑level monitoring is deployed outside the Swarm to ensure visibility during cluster outages.
Promotion
Adoption was driven by demonstrating container benefits: lightweight, seconds‑level startup, consistent environments across dev/test/prod, rapid scaling based on CPU/memory/QPS, and automatic high‑availability.
Training, documentation, and priority support helped developers transition, while reducing reliance on virtual machines.
Conclusion
Moving from VM‑centric architecture to a cloud‑native container platform delivered second‑level deployments, simplified CI/CD, and service‑mesh capabilities, dramatically cutting operational overhead and enabling the DevOps culture to thrive.
Fangduoduo Tech
Sharing Fangduoduo's product and tech insights, delivering value, and giving back to the open community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.