Supported Models

Supported Models#

FuriosaAI’s software stack supports a wide range of Transformer-based models available on the Hugging Face Hub. Below is a list of model architectures currently supported by Furiosa-LLM. If your model is based on any of these architectures, you can use Furiosa-LLM to compile and run the model efficiently on Furiosa’s NPUs.

For many of these architectures, FuriosaAI also publishes pre-compiled models under the Hugging Face Hub 🤗 - FuriosaAI organization, each shipping a Furiosa Executable Bundle (FXB) so you can download and run it quickly with Furiosa-LLM. The architecture names in the tables below link to per-architecture guides covering how to launch the pre-compiled variants with the furiosa-llm serve command — including the model-specific options each needs — and how to use their features, such as reasoning, tool calling, and multimodal input, with example requests. Each guide also notes the quantization and parallelism strategy for reference. Individual repository-level model cards live on the Hugging Face Hub.

Decoder-only Models (Text Generation)#

Model Name	Architecture	Example Hugging Face Models
EXAONE 4.0	`Exaone4ForCausalLM`	`LGAI-EXAONE/EXAONE-4.0-32B-FP8`, `LGAI-EXAONE/EXAONE-4.0-32B`
K-EXAONE	`ExaoneMoEForCausalLM`	`LGAI-EXAONE/K-EXAONE-236B-A23B`
GPT-OSS	`GptOssForCausalLM`	`openai/gpt-oss-20b`, `openai/gpt-oss-120b`
Llama 3.1, Llama 3.3	`LlamaForCausalLM`	`meta-llama/Llama-3.1-8B-Instruct`, `meta-llama/Llama-3.3-70B-Instruct`
Solar Open	`SolarOpenForCausalLM`	`upstage/Solar-Open-100B`
Qwen 2.5	`Qwen2ForCausalLM`	`Qwen/Qwen2.5-0.5B-Instruct`
Qwen 3	`Qwen3ForCausalLM`	`Qwen/Qwen3-8B`, `Qwen/Qwen3-32B`
Qwen 3 MoE	`Qwen3MoeForCausalLM`	`Qwen/Qwen3-30B-A3B-Instruct-2507`

Pooling Models#

Model Name	Architecture	Task	Example Hugging Face Models
Qwen 3 Embedding	`Qwen3Model`	Embedding	`Qwen/Qwen3-Embedding-8B`, `Qwen/Qwen3-Embedding-4B`
Qwen 3 Reranker	`Qwen3ForSequenceClassification`	Reranking	`Qwen/Qwen3-Reranker-8B`, `Qwen/Qwen3-Reranker-4B`

Vision-Language Models (Multimodal)#

See Vision-Language Models for a guide on launching a VL server and sending image inputs over the OpenAI-compatible Chat Completions API.

Model Name	Architecture	Modalities	Example Hugging Face Models
Qwen 3 VL	`Qwen3VLForConditionalGeneration`	Text, Image	`Qwen/Qwen3-VL-32B-Instruct`, `Qwen/Qwen3-VL-2B-Instruct-FP8`

Status of Models#

You can compile and run any of the architectures listed above on RNGD yourself. The models below go a step further: each is one that FuriosaAI actively validates, and the table shows how far that validation has progressed across three checks — does it run (Function), does it produce correct results (Correctness), and has its performance been tuned (Performance).

The status of each check uses the following scale:

Status	Meaning
✅ Passed	Verified and working as expected.
🟡 Experimental	Works, but not yet fully validated or tuned.
⛔️ Unplanned	Not planned for this model.

Model	Type	Function	Correctness	Performance
EXAONE-4.0-32B-FP8	Text	✅	✅	✅
K-EXAONE-236B-A23B-NVFP4A16	Text (MoE)	✅	✅	🟡
Llama-3.1-8B-Instruct	Text	✅	✅	✅
Llama-3.3-70B-Instruct	Text	✅	✅	✅
Qwen3-30B-A3B-FP8	Text (MoE)	✅	✅	🟡
Qwen3-30B-A3B-Instruct-2507-FP8	Text (MoE)	✅	✅	🟡
Qwen3-30B-A3B-Thinking-2507-FP8	Text (MoE)	✅	✅	🟡
Qwen3-32B-FP8	Text	✅	✅	✅
Qwen3-4B-FP8	Text	✅	✅	⛔️
Qwen3-8B-FP8	Text	✅	✅	🟡
Qwen3-Coder-30B-A3B-Instruct-FP8	Text (MoE)	✅	✅	🟡
Qwen3-Embedding-8B	Embedding	✅	✅	🟡
Qwen3-Reranker-8B	Reranking	✅	✅	🟡
Qwen3-VL-32B-Instruct	Multimodal	✅	✅	🟡
Solar-Open-100B-NVFP4A16	Text (MoE)	✅	✅	🟡
gpt-oss-120b	Text (MoE)	✅	✅	🟡
gpt-oss-20b	Text (MoE)	✅	✅	🟡

For models planned for future releases, see the Roadmap.