Building a Private Cloud Elasticsearch Platform with Mesos and Docker
This article describes how the OPS team designed and implemented a private‑cloud Elasticsearch service using Mesos for resource management, Docker containers orchestrated by Marathon, and a suite of monitoring, self‑service configuration, and continuous deployment tools to improve resource utilization and operational efficiency.
The presentation introduces the OPS team's private‑cloud Elasticsearch solution built on the Mesos resource management platform and Docker container technology.
It is organized into four parts: background and current status, technical implementation, configuration and deployment, and monitoring & alerting.
1 Background and Status
At the end of 2015 and early 2016, the company's demand for Elasticsearch surged, exposing several drawbacks of the traditional usage model. The team defined design goals to address these issues and built the platform accordingly.
Since its launch in March‑April 2016, the platform has significantly improved work efficiency in three areas, as shown by resource‑utilization statistics and the current scale of the platform.
2 Technical Implementation
The team investigated three reference systems: Elastic Cloud (official Elastic public‑cloud service), Amazon Elasticsearch Service, and an open‑source Mesos‑based scheduling framework. Based on their limitations, a custom solution was designed.
The platform runs on Mesos, with all components packaged as Docker containers and scheduled by Marathon. The architecture includes a Root Marathon that schedules Sub Marathons, each Sub Marathon representing a business line and hosting multiple Elasticsearch SaaS services.
Resource allocation follows a hierarchical Marathon model: Root Marathon owns all resources, while Sub Marathon receives a fixed quota and maps one‑to‑one with a business line.
Each Sub Marathon can host multiple Elasticsearch clusters, each consisting of four core components (bamboo, es‑master, es‑datanode, es2graphite) deployed as Marathon apps. Service discovery is handled by bamboo + HAProxy, and metrics are collected by pyadvisor and sent to Graphite.
3 Configuration and Deployment
All Elasticsearch configurations are stored in GitLab, including a customizable pre‑run script executed before container startup. Changes take effect after a container restart.
A self‑service web UI provides detailed cluster information and allows users to perform configuration and plugin management.
Continuous deployment is driven by Jenkins in three steps: configuration initialization (generating files stored in GitLab), cluster deployment (submitting components to Marathon), and final Marathon scheduling to bring the Elasticsearch cluster online.
4 Monitoring and Alerting
Monitoring collects metrics via two methods and aggregates them for visualization.
Alerting covers several aspects, illustrated in the following diagrams.
Overall, the solution manages the full lifecycle of Elasticsearch clusters—from capacity planning and configuration, through automated deployment, to self‑service management, comprehensive monitoring, alerting, and resource reclamation upon decommissioning.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.