Supported Models#

FuriosaAI’s software stack supports a wide range of Transformer-based models available on the Hugging Face Hub. Below is a list of model architectures currently supported by Furiosa-LLM. If your model is based on any of these architectures, you can use Furiosa-LLM to compile and run the model efficiently on Furiosa’s NPUs.

For many of these architectures, FuriosaAI also publishes pre-compiled models under the Hugging Face Hub 🤗 - FuriosaAI organization, each shipping a Furiosa Executable Bundle (FXB) so you can download and run it quickly with Furiosa-LLM. The architecture names in the tables below link to per-architecture guides covering how to launch the pre-compiled variants with the furiosa-llm serve command — including the model-specific options each needs — and how to use their features, such as reasoning, tool calling, and multimodal input, with example requests. Each guide also notes the quantization and parallelism strategy for reference. Individual repository-level model cards live on the Hugging Face Hub.

Decoder-only Models (Text Generation)#

Model Name

Architecture

Example Hugging Face Models

EXAONE 4.0

Exaone4ForCausalLM

LGAI-EXAONE/EXAONE-4.0-32B-FP8, LGAI-EXAONE/EXAONE-4.0-32B

K-EXAONE

ExaoneMoEForCausalLM

LGAI-EXAONE/K-EXAONE-236B-A23B

GPT-OSS

GptOssForCausalLM

openai/gpt-oss-20b, openai/gpt-oss-120b

Llama 3.1, Llama 3.3

LlamaForCausalLM

meta-llama/Llama-3.1-8B-Instruct, meta-llama/Llama-3.3-70B-Instruct

Solar Open

SolarOpenForCausalLM

upstage/Solar-Open-100B

Qwen 2.5

Qwen2ForCausalLM

Qwen/Qwen2.5-0.5B-Instruct

Qwen 3

Qwen3ForCausalLM

Qwen/Qwen3-8B, Qwen/Qwen3-32B

Qwen 3 MoE

Qwen3MoeForCausalLM

Qwen/Qwen3-30B-A3B-Instruct-2507

Pooling Models#

Model Name

Architecture

Task

Example Hugging Face Models

Qwen 3 Embedding

Qwen3Model

Embedding

Qwen/Qwen3-Embedding-8B, Qwen/Qwen3-Embedding-4B

Qwen 3 Reranker

Qwen3ForSequenceClassification

Reranking

Qwen/Qwen3-Reranker-8B, Qwen/Qwen3-Reranker-4B

Vision-Language Models (Multimodal)#

See Vision-Language Models for a guide on launching a VL server and sending image inputs over the OpenAI-compatible Chat Completions API.

Model Name

Architecture

Modalities

Example Hugging Face Models

Qwen 3 VL

Qwen3VLForConditionalGeneration

Text, Image

Qwen/Qwen3-VL-32B-Instruct, Qwen/Qwen3-VL-2B-Instruct-FP8

Status of Models#

You can compile and run any of the architectures listed above on RNGD yourself. The models below go a step further: each is one that FuriosaAI actively validates, and the table shows how far that validation has progressed across three checks — does it run (Function), does it produce correct results (Correctness), and has its performance been tuned (Performance).

The status of each check uses the following scale:

Status

Meaning

✅ Passed

Verified and working as expected.

🟡 Experimental

Works, but not yet fully validated or tuned.

⛔️ Unplanned

Not planned for this model.

Model

Type

Function

Correctness

Performance

EXAONE-4.0-32B-FP8

Text

K-EXAONE-236B-A23B-NVFP4A16

Text (MoE)

🟡

Llama-3.1-8B-Instruct

Text

Llama-3.3-70B-Instruct

Text

Qwen3-30B-A3B-FP8

Text (MoE)

🟡

Qwen3-30B-A3B-Instruct-2507-FP8

Text (MoE)

🟡

Qwen3-30B-A3B-Thinking-2507-FP8

Text (MoE)

🟡

Qwen3-32B-FP8

Text

Qwen3-4B-FP8

Text

⛔️

Qwen3-8B-FP8

Text

🟡

Qwen3-Coder-30B-A3B-Instruct-FP8

Text (MoE)

🟡

Qwen3-Embedding-8B

Embedding

🟡

Qwen3-Reranker-8B

Reranking

🟡

Qwen3-VL-32B-Instruct

Multimodal

🟡

Solar-Open-100B-NVFP4A16

Text (MoE)

🟡

gpt-oss-120b

Text (MoE)

🟡

gpt-oss-20b

Text (MoE)

🟡

For models planned for future releases, see the Roadmap.