How Huatuo Now Monitors MetaX GPUs for Cloud‑Native AI Workloads
Huatuo, the open‑source deep‑observability platform backed by Didi, now supports real‑time monitoring of MetaX GPUs, offering detailed hardware metrics via Docker or Kubernetes deployments and exposing them through a /metrics endpoint for cloud‑native AI and operations use cases.
Project Overview
Huatuo is an open‑source deep‑observability project initiated by Didi and incubated by the China Computer Federation (CCF). It provides kernel‑level monitoring for cloud‑native general computing, AI workloads, and core services, covering components such as GPU, CPU, caches, TLB, memory ECC, PCIe, NIC links, and ACPI.
MetaX GPU Support
Huatuo now integrates with MetaX GPUs via the libmxsml library. When enabled, it can collect real‑time GPU information including model, identifier, driver version, power consumption, temperature, utilization, clock frequencies, PCIe bandwidth, and MetaXLink communication metrics.
Exposed Metrics
GPU basic info: model, identifier, driver version
GPU status: power, temperature, utilization, clock frequencies
GPU communication: PCIe speed/bandwidth, MetaXLink speed/bandwidthContainer Deployment
To enable MetaX GPU monitoring in a container, mount the required system paths and run the Huatuo image. Example Docker command:
docker run --privileged --cgroupns=host --network=host \
-v /sys:/sys \
-v /proc:/proc \
-v /run:/run \
-v /opt/maca:/opt/maca \
-v /opt/mxdriver:/opt/mxdriver \
-v /dev/dri:/dev/dri \
huatuo/huatuo-bamai:latestIn Kubernetes, create the appropriate PersistentVolume and PersistentVolumeClaim, then access the service’s /metrics endpoint. Presence of metrics prefixed with metax_ indicates successful GPU data collection.
Metric Index Definitions
GPU index: starts at 0 for Native and VF modes, at 100 for PF mode
CE: Correctable Errors
UE: Uncorrectable Errors
MetaXLink: proprietary GPU‑to‑GPU interconnect, indices start at 1Repository
Huatuo project GitHub: https://github.com/ccfos/huatuo
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
