Installing Furiosa NPU Operator#
Note
Furiosa NPU Operator is in alpha stage. Its features are experimental and may change in future releases.
The operator may not be fully stable or feature-complete. Use caution when deploying it in production environments.
Furiosa NPU Operator#
The Furiosa NPU Operator is a Kubernetes operator that automates the deployment and management of FuriosaAI NPU devices within a Kubernetes cluster. It deploys and configures the drivers and cloud-native components required to use FuriosaAI NPUs in containerized workloads.
Managed components#
The following components are managed by the Furiosa NPU Operator:
Furiosa Feature Discovery
Furiosa Device Plugin
Furiosa DRA Driver
Furiosa Metrics Exporter
Furiosa System Manager
Prerequisites#
The Furiosa NPU Operator requires the following prerequisites:
Deploying Node Feature Discovery in the cluster is recommended to discover FuriosaAI NPU devices.
Nodes with FuriosaAI NPU devices must have the label
feature.node.kubernetes.io/pci-1200_1ed2.present: "true".
Kubernetes 1.34+ is required if you want to use Furiosa DRA Driver.
In this case, CDI must be enabled on each node.
Starting from containerd 2.0, CDI is enabled by default.
Deploying Furiosa NPU Operator with Helm#
The Furiosa NPU Operator Helm chart is available at: furiosa-ai/helm-charts
To configure deployment, modify:
charts/furiosa-npu-operator/values.yaml
Deploy the operator:
helm repo add furiosa https://furiosa-ai.github.io/helm-charts
helm repo update
helm install furiosa-npu-operator furiosa/furiosa-npu-operator \
-n furiosa-system --create-namespace
After the Furiosa NPU Operator is successfully deployed, create a
FuriosaClusterConfig Custom Resource to manage cloud-native components for
FuriosaAI NPU devices.
Example usage#
The example below creates a FuriosaClusterConfig resource to deploy managed
components.
apiVersion: furiosa.ai/v1alpha1
kind: FuriosaClusterConfig
metadata:
name: example
spec:
featureDiscovery:
enabled: true
registry: docker.io/furiosaai/furiosa-ai
tag: 2026.1.0-rc0
image: furiosa-feature-discovery
devicePlugin:
enabled: false
registry: docker.io/furiosaai/furiosa-ai
tag: 2026.1.0-rc0
image: furiosa-device-plugin
draDriver:
enabled: true
registry: docker.io/furiosaai/furiosa-ai
tag: 2026.1.0-rc0
image: furiosa-dra-driver
metricsExporter:
enabled: true
registry: docker.io/furiosaai/furiosa-ai
tag: 2026.1.0-rc0
image: furiosa-metrics-exporter
config:
serviceType: ClusterIP
servicePort: 6254
enableScrapeAnnotations: true
collectInterval: 10
systemManager:
enabled: true
registry: docker.io/furiosaai/furiosa-ai
tag: 2026.1.0-rc0
installerImagePrefix: furiosa-system-manager-installer
validatorImage: furiosa-system-manager-validator
enableFirmwareInstall: false
enableDriverInstall: true
upgradePolicy:
autoUpgrade: false
Creating the above FuriosaClusterConfig resource will deploy Furiosa Feature
Discovery, Furiosa DRA Driver, Furiosa Metrics Exporter, and Furiosa System
Manager on nodes where FuriosaAI NPUs are present.
Configuration for furiosa-metrics-exporter#
You can configure furiosa-metrics-exporter by modifying the config field
under the metricsExporter section in FuriosaClusterConfig.
Available options:
serviceType: Type of Kubernetes Service to create. Default:ClusterIP.servicePort: Port the metrics exporter listens on. Default:6254.enableScrapeAnnotations: Iftrue, adds Prometheus scrape annotations to the metrics exporter Pod. Default:true.collectInterval: Interval (in seconds) at which metrics are collected. Default:10.
Selecting nodes for deployment#
You can specify which nodes to deploy the managed components to by setting
nodeSelector in FuriosaClusterConfig.
Example: deploy only on nodes labeled furiosa.ai/npu: "true":
apiVersion: furiosa.ai/v1alpha1
kind: FuriosaClusterConfig
metadata:
name: example
spec:
nodeSelector:
furiosa.ai/npu: "true"
# ... other configurations ...
You can create multiple FuriosaClusterConfig resources, but each node can only
be managed by one FuriosaClusterConfig.
If multiple FuriosaClusterConfig resources match the same node, they will all
be marked with the nodeCoverageCollision condition and reconciliation will stop.
Affinity for managed components#
You can set affinity for managed components in FuriosaClusterConfig.
The
affinitysetting is applied to all managed components.
Default affinity:
apiVersion: furiosa.ai/v1alpha1
kind: FuriosaClusterConfig
metadata:
name: example
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: feature.node.kubernetes.io/pci-1200_1ed2.present
operator: In
values:
- "true"
# ... other configurations ...
If you do not set affinity, the operator applies the default configuration above.
To override affinity, set your own configuration. To clear affinity entirely,
set it to an empty object:
apiVersion: furiosa.ai/v1alpha1
kind: FuriosaClusterConfig
metadata:
name: example
spec:
affinity: {}
# ... other configurations ...
Common configuration for managed components#
You can configure common options for these managed components:
featureDiscoverydevicePlugindraDrivermetricsExporter
Common options:
enabled: Enable/disable the component. Default:false.registry: Container image registry. Default:docker.io/furiosaai.image: Container image name. Default depends on the component (e.g.,furiosa-feature-discovery).tag: Container image tag. Required.imagePullPolicy: Image pull policy. Default:IfNotPresent.imagePullSecrets: Image pull secrets.resources: Resource requests and limits.tolerations: Tolerations.updateStrategy: DaemonSet update strategy.
Furiosa System Manager configuration#
Configure Furiosa System Manager under the systemManager section.
Available options:
enabled: Enable/disable the component. Default:false.registry: Container image registry. Default:docker.io/furiosaai.installerImagePrefix: Installer image name prefix. Default:furiosa-system-manager-installer.validatorImage: Validator image name. Default:furiosa-system-manager-validator.tag: Container image tag. Required.enableDriverInstall: Enable driver installation. Default:true.enableFirmwareInstall: Enable firmware installation. Default:false.imagePullPolicy: Image pull policy. Default:IfNotPresent.imagePullSecrets: Image pull secrets.resources: Resource requests and limits.tolerations: Tolerations.upgradePolicy: Upgrade policy.autoUpgrade: Enable automatic upgrade. Default:false.maxParallelUpgrades: Max nodes upgraded in parallel. Default:1.maxUnavailable: Max unavailable during upgrade. Default:25%.
Important note: Using Device Plugin and DRA Driver#
FuriosaAI NPUs can be managed using either the Device Plugin or the DRA Driver.
However, using both simultaneously on the same node is not supported and may lead to conflicts.
If both are enabled in FuriosaClusterConfig, the operator will set the
invalidSpec condition on the resource and stop reconciliation.
Important note: DeviceClass secondary resource#
If you enable the DRA Driver in FuriosaClusterConfig, a DeviceClass resource
is created with a fixed name: npu.furiosa.ai.
Example:
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: npu.furiosa.ai
labels:
app.kubernetes.io/instance: npu.furiosa.ai
app.kubernetes.io/managed-by: furiosa-npu-operator
app.kubernetes.io/name: furiosa-dra-driver
ownerReferences:
- apiVersion: furiosa.ai/v1alpha1
kind: FuriosaClusterConfig
name: example
uid: 78ddc742-2c01-4201-bdca-306d8f8931d9
spec:
selectors:
- cel:
expression: device.driver == 'npu.furiosa.ai'
If multiple FuriosaClusterConfig resources enable the DRA Driver, the existing
DeviceClass is not modified except for metadata.ownerReferences.
Example with two owners:
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: npu.furiosa.ai
labels:
app.kubernetes.io/instance: npu.furiosa.ai
app.kubernetes.io/managed-by: furiosa-npu-operator
app.kubernetes.io/name: furiosa-dra-driver
ownerReferences:
- apiVersion: furiosa.ai/v1alpha1
kind: FuriosaClusterConfig
name: example
uid: 31553189-c4be-4865-be51-0f5ea29387f4
- apiVersion: furiosa.ai/v1alpha1
kind: FuriosaClusterConfig
name: example2
uid: eeecf263-5468-4b5f-b298-ad2419023009
spec:
selectors:
- cel:
expression: device.driver == 'npu.furiosa.ai'