How PromQL Copilot Turns Natural Language into Precise Monitoring Queries
PromQL Copilot leverages Alibaba Cloud's observability platform and AI techniques to convert ambiguous natural‑language monitoring requests into accurate PromQL statements, addressing challenges of ambiguity, domain knowledge, and metric coverage while providing generation, explanation, diagnosis, and recommendation features for cloud‑native environments.
In modern cloud‑native and distributed system operations, Prometheus and its query language PromQL have become the de‑facto standard for monitoring and alerting, but the steep learning curve and complex syntax hinder efficient use.
Challenges of Natural‑Language to PromQL
Ambiguity: the same natural‑language query can map to different metrics depending on context.
Implicit intent: vague questions such as "how is system load?" lack concrete metric references.
Label mapping: terms like "instance" may correspond to different label keys (instance, instanceID).
Solutions Implemented in PromQL Copilot
Built a comprehensive metric knowledge base covering Alibaba Cloud products and open‑source components, enabling accurate metric recommendation.
Query rewriting and multi‑turn interaction to clarify user intent and enrich queries with domain tags (e.g., container, k8s).
Integrated SLS SPL to retrieve actual metric labels, ensuring generated PromQL uses correct label names.
Pre‑run validation and automatic diagnosis to fix syntax or semantic errors before execution.
Core Features
PromQL generation from natural language.
PromQL explanation that breaks down query structure.
PromQL diagnosis with suggestions for improvement.
Metric recommendation tailored to the user's cloud product or scenario.
Architecture Overview
The system relies on Alibaba Cloud's SLS and CMS infrastructure and the Dify framework to form an end‑to‑end loop of natural‑language understanding, knowledge graph retrieval, query generation, and execution verification.
Usage in Cloud Console and MCP Service
In the Cloud Monitoring console, users can input everyday language and receive an executable PromQL query.
The Observability MCP service exposes the same capabilities via API and GitHub‑hosted tools.
Example Query
To find the pod with the highest outbound traffic:
topk(1, max by (pod_name)(rate(container_network_transmit_bytes_total{}[1m])))Future Outlook
Ongoing work focuses on improving metric knowledge graph accuracy, enhancing prompt engineering for higher PromQL correctness, and optimizing the output format of MCP tools.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
