Installing Furiosa Metrics Exporter

Installing Furiosa Metrics Exporter#

Furiosa Metrics Exporter#

The Furiosa metrics exporter exposes collection of metrics related to FuriosaAI NPU devices in Prometheus format. In a Kubernetes cluster, you can scrape the metrics provided by furiosa-metrics-exporter using Prometheus and visualize them with a Grafana dashboard. This can be easily set up using the Prometheus Chart and Grafana Helm charts, along with the furiosa-metrics-exporter Helm chart.

Metrics#

The exporter is composed of chain of collectors, each collector is responsible for collecting specific metrics from the Furiosa NPU devices. The following table shows the available collectors and metrics:

NPU Metrics#
Collector Name	Metric	Type	Metric Labels	Description
Liveness	furiosa_npu_alive	guage	arch, core, device, uuid, kubernetes_node_name	The liveness of the Furiosa NPU device.
Temperature	furiosa_npu_hw_temperature	guage	arch, core, device, uuid, kubernetes_node_name, label	The temperature of the Furiosa NPU device.
Power	furiosa_npu_hw_power	guage	arch, core, device, uuid, kubernetes_node_name, label	The power consumption of the Furiosa NPU device.
Core Utilization	furiosa_npu_core_utilization	guage	arch, core, device, uuid, kubernetes_node_name	The core utilization of the Furiosa NPU device.

All metrics share common metric labels such as arch, core, device, kubernetes_node_name, and uuid. The following table describes the common metric labels:

Common NPU Metrics Label#
Common Metric Label	Description
arch	The architecture of the Furiosa NPU device. e.g. warboy, rngd
core	The core number of the Furiosa NPU device. e.g. 0, 1, 2, 3, 4, 5, 6, 7, 0-1, 2-3, 0-3, 4-5, 6-7, 4-7, 0-7
device	The device name of the Furiosa NPU device. e.g. npu0
kubernetes_node_name	The name of the Kubernetes node where the exporter is running, this attribute can be missing if the exporter is running on the host machine or in a naked container.
uuid	The UUID of the Furiosa NPU device.

The metric label “label” is used to describe additional attributes specific to each metric. This approach helps avoid having too many metric definitions and effectively aggregates metrics that share common characteristics.

NPU Metrics Type#
Metric Type	Label Attribute	Description
Temperature	peak	The highest temperature observed from SoC sensors
Temperature	ambient	The temperature observed from sensors attached to the board
Power	rms	Root Mean Square (RMS) value of the power consumed by the device, providing an average power consumption metric over a period of time.

The following shows real-world example of the metrics:

#liveness
furiosa_npu_alive{arch="rngd",core="0-7",device="npu0",kubernetes_node_name="node",uuid="uuid"} 1

#temperature
furiosa_npu_hw_temperature{arch="rngd",core="0-7",device="npu0",kubernetes_node_name="node",label="peak",uuid="uuid"} 39
furiosa_npu_hw_temperature{arch="rngd",core="0-7",device="npu0",kubernetes_node_name="node",label="ambient",uuid="uuid"} 35

#power
furiosa_npu_hw_power{arch="rngd",core="0-7",device="npu0",kubernetes_node_name="node",label="rms",uuid="uuid"} 4795000

#core utilization
furiosa_npu_core_utilization{arch="rngd",core="0",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="1",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="2",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="3",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="4",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="5",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="6",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90
furiosa_npu_core_utilization{arch="rngd",core="7",device="npu0",kubernetes_node_name="node",uuid="uuid"} 90

Deploying Furiosa Metrics Exporter with Helm#

The Furiosa metrics exporter helm chart is available at furiosa-ai/helm-charts. To configure deployment as you need, you can modify charts/furiosa-metrics-exporter/values.yaml. For example, the Furiosa metrics exporter Helm chart automatically creates a Service Object with Prometheus annotations to enable metric scraping automatically. You can modify the values.yaml to change the port or disable the Prometheus annotations if needed. You can deploy the Furiosa Metrics Exporter by running the following commands:

helm repo add furiosa https://furiosa-ai.github.io/helm-charts
helm repo update
helm install furiosa-metrics-exporter furiosa/furiosa-metrics-exporter -n kube-system