Release Notes - 0.9.0

Furiosa SDK 0.9.0 is a major release, including many performance enhancements, additional functions, and bug fixes. In partcular, 0.9.0 release includes the significant improvements of the quantization tools.

Component Version Information
Package Name	Version
NPU Driver	1.7.0
NPU Firmware Tools	1.4.0
NPU Firmware Image	1.7.0
HAL (Hardware Abstraction Layer)	0.11.0
Furiosa Compiler	0.9.0
Python SDK (furiosa-runtime, furiosa-server, furiosa-serving, furiosa-quantizer, ..)	0.9.0
NPU Management CLI (furiosactl)	0.11.0
NPU Device Plugin	0.10.1
NPU Feature Discovery	0.2.0

Installing the latest SDK

If you are using APT repository, the upgrade process is simpler.

apt-get update && apt-get upgrade

If you wish to designate a specific package for upgrade, execute as below: You can find more details about APT repository setup at Driver, Firmware, and Runtime Installation.

apt-get update && \
apt-get install -y furiosa-driver-pdma furiosa-libhal-warboy furiosa-libnux furiosactl

You can upgrade firmware as follows:

apt-get update && \
apt-get install -y furiosa-firmware-tools furiosa-firmware-image

You can upgrade Python package as follows:

pip install --upgrade pip setuptools wheel
pip install --upgrade furiosa-sdk

Warning

When installing or upgrading the furiosa-sdk without updating pip to the latest version, you may encounter the following errors.

ERROR: Could not find a version that satisfies the requirement furiosa-quantizer-impl==0.9.* (from furiosa-quantizer==0.9.*->furiosa-sdk) (from versions: none)
ERROR: No matching distribution found for furiosa-quantizer-impl==0.9.* (from furiosa-quantizer==0.9.*->furiosa-sdk)

Major changes

Quantization tool

Quantization tool is a library that converts a pre-trained model to a quantized model. You can refer to more details at Model Quantization 0.9.0 release includes the API improvement and new calibration methods, possibly leading to better accuracy.

Added new quantization-related APIs that are more flexible and solid. (furiosa.quantizer , furiosa.optimizer )

optimized_onnx_model = optimize_model(source_onnx_model)
calibrator = Calibrator(optimized_onnx_model, CalibrationMethod.MIN_MAX_ASYM)
for calibration_data, _ in tqdm.tqdm(calibration_dataloader, desc="Calibration", unit="images", mininterval=0.5):
  calibrator.collect_data([[calibration_data.numpy()]])
ranges = calibrator.compute_range()
quantizated_graph = quantize(optimized_onnx_model, ranges)

Added an option to decide whether to perform quantize at the beginning of the model.
- Instead of without_quantize being removed from the compiler options, it can be specified via the argument with_quantize to the quantize function.
The normalized_pixel_outputs argument to the quantize function can be set to convert the model output to uint8 instead of dequantizing to fp32.
- A tensor with an element range of (0. , 1.) can be optimized to convert to pixel data in uint8.
Provides more calibration methods.

Supported Calibration Methods
Calibration Method	Asymmetric	QuasiSymmetric
Min-Max	MIN_MAX_ASYM	MIN_MAX_SYM
Entropy	ENTROPY_ASYM	ENTROPY_SYM
Percentile	PERCENTILE_ASYM	PERCENTILE_SYM
Mean squared error	MSE_ASYM	MSE_SYM
Signal-to-quantization-noise ratio	SQNR_ASYM	SQNR_SYM

To ensure the effectiveness of new calibration methods, we measured the accuracy of 10 popular models with the new calibration methods. Among them, 8 models showed better accuracy than the existing calibration methods. For example, the accuracy of EfficientNet-B0 increased by 57.452%. With the min-max calibration method, EfficientNet-B0 had an accuracy of 16.104%. In contrast, with the percentile calibration method, the accuracy was 73.556%. The details of the experiment results can be found at Quantization Accuracy.

For more information on installing and using the new quantizer, you can refer to the following examples.

Tutorial: How to use Furiosa SDK from Start to Finish

Compiler

Added acceleration support for operators Lower, Unlower
Added acceleration support for operator Dequantize
Support for executing binaries that are larger than the hardware’s instruction memory
Improved scheduler and memory allocator to eliminate unnecessary I/O
Various improvements optimize compilation for better execution performance

furiosa-toolkit

The furiosactl command-line tool included in the furiosa-toolkit 0.11.0 release includes improvements to the includes the following major improvements

The newly added furiosactl top command is used to view utilization by NPU device over time.

$ furiosactl top --interval 200
NOTE: furiosa top is under development. Usage and output formats may change.
Please enter Ctrl+C to stop.
Datetime                        PID       Device        NPU(%)   Comp(%)   I/O(%)   Command
2023-03-21T09:45:56.699483936Z  152616    npu1pe0-1      19.06    100.00     0.00   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:56.906443888Z  152616    npu1pe0-1      51.09     93.05     6.95   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.110489333Z  152616    npu1pe0-1      46.40     97.98     2.02   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.316060982Z  152616    npu1pe0-1      51.43    100.00     0.00   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.521140588Z  152616    npu1pe0-1      54.28     94.10     5.90   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.725910558Z  152616    npu1pe0-1      48.93     98.93     1.07   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.935041998Z  152616    npu1pe0-1      47.91    100.00     0.00   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:58.13929122Z   152616    npu1pe0-1      49.06     94.94     5.06   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf

The furiosactl info command has been improved to display concise information about each device. As before, you can enter the --full option if you want to see more information about a device.

$ furiosactl info
+------+--------+----------------+-------+--------+--------------+
| NPU  | Name   | Firmware       | Temp. | Power  | PCI-BDF      |
+------+--------+----------------+-------+--------+--------------+
| npu1 | warboy | 1.6.0, 3c10fd3 |  54°C | 0.99 W | 0000:44:00.0 |
+------+--------+----------------+-------+--------+--------------+

$ furiosactl info --full
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+
| NPU  | Name   | UUID                                 | S/N               | Firmware       | Temp. | Power  | PCI-BDF      | PCI-DEV |
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+
| npu1 | warboy | 00000000-0000-0000-0000-000000000000 | WBYB0000000000000 | 1.6.0, 3c10fd3 |  54°C | 0.99 W | 0000:44:00.0 | 511:0   |
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+

More information about installing and using furiosactl can be found in furiosa-toolkit.