Operations 7 min read

Qunar Network Device Operations Platform: Architecture, Features, and Continuous Optimization

This article presents the design, implementation, and ongoing improvements of Qunar's network device operations platform, detailing its background, optimization strategies, permission model, automated tasks, monitoring capabilities, and how it enhances operational efficiency while reducing risk.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Qunar Network Device Operations Platform: Architecture, Features, and Continuous Optimization

Author Liu Liang joined Qunar in 2014 after graduating from the Institute of Software, Chinese Academy of Sciences, and previously worked at Baidu on hardware operations and development; he now leads the development of Qunar's hardware operations tools and platforms.

Background : Over recent years, the number of network devices at Qunar has grown steadily while the netops team remains small, leading to increasing individual workload; manual CLI and script‑based changes are inefficient and risky, lacking audit trails.

Optimization ideas include integrating common tools and commands into a platform, automating repetitive tasks, implementing intelligent pre‑checks and rollback, enforcing hierarchical permission control, and recording detailed operation logs for audit.

The resulting Qunar Network Device Operations Platform is illustrated in the following screenshots.

1. Permission Control

Five permission levels: Visitor, Read‑Only, Read‑Write, Administrator, Super Administrator.

Atomic operations are bound to specific levels (e.g., visitors can only view limited information, read‑only users can query device data, etc.).

Tips: Higher‑level users can grant lower‑level permissions to others and view logs of users they have authorized; users cannot view logs of peers or higher‑level users.

2. Operations and Tasks

The platform supports automated operations such as:

Scanning core and access switch relationships.

Collecting, backing up, and synchronizing global and per‑port configurations.

Port up/down, description changes, speed adjustments, trunk configuration.

VLAN assignment.

Port locking to prevent accidental changes.

When a user confirms an operation, a Celery task is launched. There are two task types:

Immediate tasks : Execute the corresponding network command via SSH; on failure the task rolls back and alerts the user.

Scheduled tasks : Map routine operations to recurring jobs that can be triggered manually or on a schedule.

All tasks automatically generate detailed logs that can be queried according to the user’s permission level.

3. Monitoring Management

The platform monitors two data layers: network‑level clusters (core switches with associated access switches) and device‑level ports. Automatic discovery builds topology maps and identifies abnormal ports or traffic loads.

Device‑level metrics are collected via SNMP using collectd; users configure metrics and templates in the platform, which then notifies the Marathon‑managed collectd Docker cluster to update its scraping instances.

Monitoring configuration is flexible, supporting metric, template, and rule dimensions, and enables dynamic scaling and load balancing of the scraping cluster using Docker and Marathon.

Continuous optimization efforts aim to address increasingly complex netops challenges, iterating quickly to improve efficiency, reduce risk, and streamline workflows.

Conclusion

The Qunar Network Device Operations Platform was built to solve real‑world operational difficulties, delivering higher efficiency, lower risk, and a more streamlined process, with ongoing plans for rapid iteration and further enhancements.

monitoringAutomationtask schedulingaccess controlplatformNetwork OperationsAudit
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.