Operations 14 min read

How 58 Daojia Built a Cloud‑Native Ops Platform to Streamline Migration and Cut Costs

This article recounts 58 Daojia’s four‑year journey from migrating its IDC infrastructure to public cloud, the challenges encountered, and how the team designed and evolved a multi‑generation operations platform that centralizes asset, cost, domain, and monitoring management, ultimately improving efficiency and reducing expenses.

dbaplus Community
dbaplus Community
dbaplus Community
How 58 Daojia Built a Cloud‑Native Ops Platform to Streamline Migration and Cut Costs

Background

Public cloud has become a mature, stable and cost‑effective option for many small‑to‑medium internet companies. In early 2016 58 Daojia decided to migrate all workloads from traditional IDC to a public‑cloud environment to reduce capital expenditure, simplify maintenance and improve reliability.

Migration Process ("Lingyun" Project)

The migration lasted 114 days, moving more than 2 TB of data, over 160 services and 70 databases. Traffic was shifted gradually using nginx upstream configuration, which allowed a smooth cut‑over without DNS‑level changes and provided an instant rollback path. Major incidents encountered during the migration included a prolonged public‑cloud backbone outage, HAVIP database failures lasting over two hours, and a rapid increase in monthly cloud spend (costs doubled within a few months).

Migration diagram
Migration diagram

First‑Generation Ops Platform (Oct 2016)

To address asset ownership, cost accounting, NAT permission and domain‑query problems, a first‑generation platform was built. It provided a centralized view of servers, databases and DNS entries and automated many manual processes that previously relied on spreadsheets.

First‑generation platform architecture
First‑generation platform architecture
First‑generation platform UI
First‑generation platform UI

Second‑Generation Ops Platform (Apr 2019)

With Python developers joining the ops team, a second‑generation platform was launched, adding a suite of functional modules:

Cost Center – Exports department‑level asset and expense data, enabling transparent cost visibility and policy enforcement.

Asset Management (Servers) – Tracks ownership, utilization (CPU, memory) and provides deployment suggestions such as “de‑provision if CPU < 40 %”.

CDN File Refresh – Self‑service static‑file refresh via cloud CDN API with role‑based permission control.

Domain Management – Unified UI for internal DNS, public‑cloud DNS and commercial DNS providers.

Monitoring Integration – Embeds Grafana dashboards for real‑time server metrics.

Cluster Domain Management – Keyword/port/IP queries and HTTP APIs for adding/removing domains and clusters.

User & System Configuration – Role‑based access control for each module.

Site Navigation – Quick links to request forms, bastion host, job tickets and internal commands.

Second‑generation platform architecture
Second‑generation platform architecture
Current ops platform UI
Current ops platform UI

Key Technical Modules

Cost Center

Aggregates cloud‑provider billing data with Zabbix usage metrics. Policies such as “de‑provision servers with CPU < 40 %” are enforced automatically, and periodic expense reports are generated per department.

Asset Management

Maintains a database of server ownership, tags, and utilization statistics. Provides queries like “which department is generating traffic from IP X.X.X.X?” and suggests capacity adjustments.

CDN Refresh Service

Calls the cloud provider’s CDN purge API from the platform UI. Permissions are checked against the user’s role; abusive refresh attempts are logged and can be blocked.

Domain Management

Consolidates internal DNS, public‑cloud DNS and third‑party DNS (e.g., DNSPod) into a single interface. Adding, updating or deleting a domain updates all underlying providers via API calls.

Monitoring Integration

Grafana dashboards are embedded directly in the platform, allowing engineers to view server‑level metrics (CPU, memory, network) without leaving the ops portal.

Cluster Domain Management

Provides a searchable catalogue of domain‑to‑cluster mappings. HTTP APIs enable automated addition/removal of domains when clusters scale up or down.

User & System Configuration

Implements role‑based access control (RBAC) at the module level, ensuring that only authorized teams can modify NAT rules, DNS records or cost‑center data.

Site Navigation

Centralizes links to request forms, bastion host, job ticket system and common command‑line utilities, reducing context‑switching for both developers and ops engineers.

Cost Management Implementation

The cost‑center module pulls billing data from the cloud provider’s cost‑center API and merges it with real‑time utilization metrics collected by Zabbix. A policy engine evaluates thresholds (e.g., CPU < 40 % for > 7 days) and automatically creates de‑provision tickets. Exported CSV/Excel reports are sent to each department on a quarterly basis.

Multi‑Cloud Connectivity Guidance

For hybrid or multi‑cloud environments, the authors recommend using third‑party interconnect services or dedicated IDC‑to‑cloud leased lines. If an on‑premise IDC exists, separate dedicated links to each cloud provider can be aggregated at the IDC to form a private backbone.

Monitoring Strategy

Infrastructure (servers, network devices) is monitored with Zabbix or Open‑Falcon. Container and Kubernetes workloads will be monitored with Prometheus. For Java services, Meituan’s open‑source CAT framework is suggested. The platform plans to expose a “monitoring‑center” module that lets product owners add custom monitors via a UI.

Future Roadmap

Planned enhancements include deeper automation for container and Kubernetes workloads, refined cost‑optimization rules, and expanded self‑service capabilities for additional business teams.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Monitoringcloud migrationcost managementasset managementoperations platform
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.