What’s New#
This page describes the changes and functionality available in in the latest releases of Furiosa SDK 2024.2.0.
Furiosa SDK 2024.2.0 (2024-12-23)#
2024.2.0 is the second SDK release for RNGD. This release is beta 0 release, and the features and APIs described in this document may change in the future.
Highlights#
New Model support: Solar, EXAONE-3.0, CodeLLaMA2, Vicuna, …
Tensor Parallelism support (
tensor_parallel_size <= 8
)Torch 2.4.1 support
Transformers 4.44.2 support
- Furiosa LLM
- Huggingface Transformers compatible API support (furiosa_llm.optimum)
AutoModel, AutoModelForCausalLM, AutoModelForQuestionAnswering API
QuantizerForCausalLM API support for calibration and quantization
ArtifactBuilder API and CLI tools (refer to ArtifactBuilder API)
LLMEngine, AsyncLLMEngine API support compatiable with vLLM
Up to 8k context length support in LLaMA 3.1 models
- 20% performance improvements in LLaMA 3.1 models
3300 tokens/sec on 8B model
Furiosa SDK 2024.1.0 (2024-10-11)#
2024.1.0 is the first SDK release for RNGD. This release is alpha release, and the features and APIs described in this document may change in the future.
Highlights#
Model Support: LLaMA 3.1 8B/70B, BERT Large, GPT-J 6B
- Furiosa Quantizer supports the following quantization methods:
BF16 (W16A16)
INT8 Weight-Only (W8A16)
FP8 (W8A8)
INT8 SmoothQuant (W8A8)
- Furiosa LLM
Efficient KV cache management with PagedAttention
Continuous batching support in serving
OpenAI-compatible API server
Greedy search and beam search
Pipeline Parallelism and Data Parallelism across multiple NPUs
furiosa-mlperf
commandServer and Offline scenarios
BERT, GPT-J, LLaMA 3.1 benchmarks
- System Management Interface
System Management Interface Library and CLI for Furiosa NPU family
- Cloud Native Toolkit
Kubernetes integration for managing and monitoring the Furiosa NPU family
Package name |
Version |
---|---|
furiosa-compiler |
2024.2.0 |
furiosa-device-plugin |
2024.2.0 |
furiosa-driver-rngd |
2024.2.0 |
furiosa-feature-discovery |
2024.2.0 |
furiosa-firmware-image-tools |
2024.2.0 |
furiosa-firmware-image-rngd |
2024.2.0 |
furiosa-libsmi |
2024.2.0 |
furiosa-llm |
2024.2.0 |
furiosa-llm-models |
2024.2.0 |
furiosa-mlperf |
2024.2.0 |
furiosa-mlperf-resources |
4.1.0 |
furiosa-model-compressor |
2024.2.0 |
furiosa-model-compressor-impl |
2024.2.0 |
furiosa-native-compiler |
2024.2.0 |
furiosa-native-runtime |
2024.2.0 |
furiosa-smi |
2024.2.0 |
furiosa-torch-ext |
2024.2.0 |