Installing Furiosa Metrics Exporter#
Furiosa Metrics Exporter#
The Furiosa metrics exporter exposes collection of metrics related to FuriosaAI NPU devices in Prometheus format. In a Kubernetes cluster, you can scrape the metrics provided by furiosa-metrics-exporter using Prometheus and visualize them with a Grafana dashboard. This can be easily set up using the Prometheus Chart and Grafana Helm charts, along with the furiosa-metrics-exporter Helm chart.
Deploying Furiosa Metrics Exporter with Helm#
The Furiosa metrics exporter helm chart is available at furiosa-ai/helm-charts. To configure deployment as you need, you can modify charts/furiosa-metrics-exporter/values.yaml
.
For example, the Furiosa metrics exporter Helm chart automatically creates a Service Object with Prometheus annotations to enable metric scraping automatically. You can modify the values.yaml to change the port or disable the Prometheus annotations if needed.
You can deploy the Furiosa Metrics Exporter by running the following commands:
helm repo add furiosa https://furiosa-ai.github.io/helm-charts
helm repo update
helm install furiosa-metrics-exporter furiosa/furiosa-metrics-exporter -n kube-system
Alternative Installation: furiosa-metrics-exporter via apt#
Note
For special use cases or non-Kubernetes environments, you can alternatively install the exporter via an apt
package.
The minimum requirements for installing the furiosa-metrics-exporter
package are as follows:
Ubuntu 22.04 LTS (Debian Bookworm) or later
Linux Kernel 6.3 or later
Administrator privileges on the system (root)
Then, please install the furiosa-metrics-exporter package as follows:
sudo apt update
sudo apt install -y furiosa-metrics-exporter
This command installs packages furiosa-libsmi and furiosa-metrics-exporter.
Configuration#
The Furiosa Metrics Exporter has various command-line (CLI) options. The following table summarizes the available CLI options:
CLI Option |
Required |
Description |
---|---|---|
|
Yes |
Metrics collection interval in seconds. e.g., |
|
Yes |
Port number for the metrics HTTP server. e.g., |
|
No |
If set, hostname label will be added to the metrics. Default is empty ( |
|
No |
If set, Kubernetes resource labels (such as namespace, pod, and container) will be added to the metrics. Default is |
Metrics#
The exporter is composed of chain of collectors, each collector is responsible for collecting specific metrics from the Furiosa NPU devices. The following table shows the available collectors and metrics:
Collector Name |
Metric |
Type |
Metric Labels |
Description |
---|---|---|---|---|
Liveness |
furiosa_npu_alive |
gauge |
arch, core, device, uuid, pci_bud_id, firmware_version, pert_version, driver_version, hostname, namespace, pod, container |
The liveness of the Furiosa NPU device. |
Temperature |
furiosa_npu_hw_temperature |
gauge |
arch, core, device, uuid, pci_bud_id, firmware_version, pert_version, driver_version, hostname, namespace, pod, container, label |
The temperature of the Furiosa NPU device. |
Power |
furiosa_npu_hw_power |
gauge |
arch, core, device, uuid, pci_bud_id, firmware_version, pert_version, driver_version, hostname, namespace, pod, container, label |
The power consumption of the Furiosa NPU device. |
Core Utilization |
furiosa_npu_core_utilization |
gauge |
arch, core, device, uuid, pci_bud_id, firmware_version, pert_version, driver_version, hostname, namespace, pod, container |
The core utilization of the Furiosa NPU device. |
Core Frequency |
furiosa_npu_core_frequency |
gauge |
arch, core, device, uuid, pci_bud_id, firmware_version, pert_version, driver_version, hostname, namespace, pod, container |
The core frequency of the Furiosa NPU device. |
Cycle Count |
furiosa_npu_total_cycle_count |
counter |
arch, core, device, uuid, pci_bud_id, firmware_version, pert_version, driver_version, hostname, namespace, pod, container |
The total cycle count of the Furiosa NPU device. |
Task Execution Cycle |
furiosa_npu_task_execution_cycle |
counter |
arch, core, device, uuid, pci_bud_id, firmware_version, pert_version, driver_version, hostname, namespace, pod, container |
The task execution cycle of the NPU Task. |
All metrics share common metric labels such as arch, core, device, uuid, pci_bud_id, firmware_version, pert_version, driver_version. The following table describes the common metric labels:
Label Attribute |
Description |
---|---|
arch |
The architecture of the Furiosa NPU device. e.g. warboy, rngd |
core |
The core number of the Furiosa NPU device. e.g. 0, 1, 2, 3, 4, 5, 6, 7, 0-1, 2-3, 0-3, 4-5, 6-7, 4-7, 0-7 |
device |
The device name of the Furiosa NPU device. e.g. npu0 |
uuid |
The UUID of the Furiosa NPU device. |
pci_bus_id |
The PCI bus ID of the Furiosa NPU device. e.g. 0000:c7:00.0 |
firmware_version |
The firmware version of the Furiosa NPU device. e.g. 2025.1.0+696efad |
pert_version |
The pert version of the Furiosa NPU device. e.g. 2025.1.0+1694e18 |
driver_version |
The driver version of the Furiosa NPU device. e.g. 2025.1.0+f09a8d8 |
The following metric labels are optional and enabled only when the corresponding command-line options (--node-name
and --kube-resources-label
) are set.
Additionally, the namespace, pod, and container labels require an environment where the Kubernetes PodResource API is supported.
Label Attribute |
Description |
---|---|
hostname |
The hostname of the machine where the exporter is running. |
namespace |
The namespace of the Kubernetes pod to which the NPU is allocated. |
pod |
The name of the Kubernetes pod to which the NPU is allocated. |
container |
The container name within the Kubernetes pod to which the NPU is allocated. |
The metric label “label” is used to describe additional attributes specific to each metric. This approach helps avoid having too many metric definitions and effectively aggregates metrics that share common characteristics.
Metric Type |
Label Attribute |
Description |
---|---|---|
Temperature |
peak |
The highest temperature observed from SoC sensors |
Temperature |
ambient |
The temperature observed from sensors attached to the board |
Power |
rms |
Root Mean Square (RMS) value of the power consumed by the device, providing an average power consumption metric over a period of time. |
Examples
The following shows real-world example of the metrics:
#liveness
furiosa_npu_alive{arch="rngd",container="furiosa",core="0-7",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1
#temperature
furiosa_npu_hw_temperature{arch="rngd",container="furiosa",core="0-7",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",label="ambient",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 52
furiosa_npu_hw_temperature{arch="rngd",container="furiosa",core="0-7",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",label="peak",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 67.756
#power
furiosa_npu_hw_power{arch="rngd",container="furiosa",core="0-7",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",label="rms",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 50
#core utilization
furiosa_npu_core_utilization{arch="rngd",container="furiosa",core="0",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 99.68363645361265
furiosa_npu_core_utilization{arch="rngd",container="furiosa",core="1",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 99.68363645361265
furiosa_npu_core_utilization{arch="rngd",container="furiosa",core="2",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 99.68363645361265
furiosa_npu_core_utilization{arch="rngd",container="furiosa",core="3",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 99.68363645361265
furiosa_npu_core_utilization{arch="rngd",container="furiosa",core="4",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 99.6826341187199
furiosa_npu_core_utilization{arch="rngd",container="furiosa",core="5",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 99.6826341187199
furiosa_npu_core_utilization{arch="rngd",container="furiosa",core="6",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 99.6826341187199
furiosa_npu_core_utilization{arch="rngd",container="furiosa",core="7",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 99.6826341187199
#core frequency
furiosa_npu_core_frequency{arch="rngd",container="furiosa",core="0",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1750
furiosa_npu_core_frequency{arch="rngd",container="furiosa",core="1",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1750
furiosa_npu_core_frequency{arch="rngd",container="furiosa",core="2",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1750
furiosa_npu_core_frequency{arch="rngd",container="furiosa",core="3",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1750
furiosa_npu_core_frequency{arch="rngd",container="furiosa",core="4",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1750
furiosa_npu_core_frequency{arch="rngd",container="furiosa",core="5",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1750
furiosa_npu_core_frequency{arch="rngd",container="furiosa",core="6",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1750
furiosa_npu_core_frequency{arch="rngd",container="furiosa",core="7",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1750
#total cycle count
furiosa_npu_total_cycle_count{arch="rngd",container="furiosa",core="0",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1.7242541456e+10
furiosa_npu_total_cycle_count{arch="rngd",container="furiosa",core="1",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1.7242541456e+10
furiosa_npu_total_cycle_count{arch="rngd",container="furiosa",core="2",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1.7242541456e+10
furiosa_npu_total_cycle_count{arch="rngd",container="furiosa",core="3",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1.7242541456e+10
furiosa_npu_total_cycle_count{arch="rngd",container="furiosa",core="4",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1.7175902913e+10
furiosa_npu_total_cycle_count{arch="rngd",container="furiosa",core="5",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1.7175902913e+10
furiosa_npu_total_cycle_count{arch="rngd",container="furiosa",core="6",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1.7175902913e+10
furiosa_npu_total_cycle_count{arch="rngd",container="furiosa",core="7",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 1.7175902913e+10
#task execution cycle
furiosa_npu_task_execution_cycle{arch="rngd",container="furiosa",core="0",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 5.686392711e+09
furiosa_npu_task_execution_cycle{arch="rngd",container="furiosa",core="1",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 5.686392711e+09
furiosa_npu_task_execution_cycle{arch="rngd",container="furiosa",core="2",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 5.686392711e+09
furiosa_npu_task_execution_cycle{arch="rngd",container="furiosa",core="3",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 5.686392711e+09
furiosa_npu_task_execution_cycle{arch="rngd",container="furiosa",core="4",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 5.685170235e+09
furiosa_npu_task_execution_cycle{arch="rngd",container="furiosa",core="5",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 5.685170235e+09
furiosa_npu_task_execution_cycle{arch="rngd",container="furiosa",core="6",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 5.685170235e+09
furiosa_npu_task_execution_cycle{arch="rngd",container="furiosa",core="7",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 5.685170235e+09
When the partitioned device is allocated to a pod, device-wise metrics that are exported for each device such as liveness, temperature and power will also be exported for partitioned devices.
For example, if the device npu0 is partitioned into 2 devices (core 0-3, 4-7) and the partitioned device with core 0-3 is allocated to a pod, the temperature metrics will be exported as follows:
furiosa_npu_hw_temperature{arch="rngd",core="0-7",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",label="ambient",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",uuid="09512C86-0702-4303-8F40-474746474A40"} 50
furiosa_npu_hw_temperature{arch="rngd",core="0-7",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",label="peak",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",uuid="09512C86-0702-4303-8F40-474746474A40"} 64.41
furiosa_npu_hw_temperature{arch="rngd",container="furiosa",core="0-3",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",label="ambient",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 50
furiosa_npu_hw_temperature{arch="rngd",container="furiosa",core="0-3",device="npu0",driver_version="2025.1.0+f09a8d8",firmware_version="2025.1.0+696efad",hostname="cntk002",label="peak",namespace="default",pci_bus_id="0000:c7:00.0",pert_version="2025.1.0+1694e18",pod="furiosa",uuid="09512C86-0702-4303-8F40-474746474A40"} 64.41