Skip to main content
Ctrl+K

Furiosa Docs

Overview

  • FuriosaAI RNGD
  • FuriosaAI’s Software Stack
  • Supported Models
  • What’s New
    • Furiosa SDK Release 2026.1
    • Release Notes for Furiosa SDK Release 2025.X
  • Roadmap

Get Started

  • Installing Prerequisites
  • Quick Start with Furiosa-LLM
  • Upgrading FuriosaAI’s Software

Furiosa-LLM

  • Furiosa-LLM
  • OpenAI-Compatible Server
  • Tool Calling
  • Structured Output
  • Prefix Caching
  • Hybrid KV Cache Management
  • Model Preparation
  • Model Parallelism
  • API Reference
    • LLM class
    • SamplingParams class
    • PoolingParams class
    • ArtifactBuilder
    • LLMEngine class
    • AsyncLLMEngine class
  • Examples
    • Chat
    • Embedding
    • Scoring (Similarity Scoring)
    • Reranking (Document Reranking)
    • OpenAI-Compatible API with Logprobs
  • Deploying Furiosa-LLM on Kubernetes

Cloud Native Toolkit

  • Cloud Native Toolkit
  • Container Support
  • Kubernetes Plugins
    • Installing Furiosa Feature Discovery
    • Installing Furiosa Device Plugin
    • Installing Furiosa DRA Driver
    • Installing Furiosa Metrics Exporter
    • Installing Furiosa NPU Operator
  • Deploying Furiosa-LLM with llm-d

Device Management

  • Furiosa SMI
    • Furiosa SMI CLI
    • Furiosa SMI Library
  • Host PCI Optimization Tuning

Tutorials and Examples

  • FuriosaAI SDK CookBook

Customer Support

  • Forums
  • Customer Support

Other Links

  • FuriosaAI Homepage
  • Furiosa Gen 1 NPU SDK Doc

© Copyright 2026 FuriosaAI Inc.

Index

A | B | C | E | F | G | H | L | M | N | P | S | T

A

  • abort() (furiosa_llm.AsyncLLMEngine method)
  • abort_request() (furiosa_llm.LLMEngine method)
  • add_request() (furiosa_llm.LLMEngine method)
  • Artifact (class in furiosa_llm.artifact)
  • ArtifactBuilder (class in furiosa_llm.artifact)
  • ArtifactMetadata (class in furiosa_llm.artifact)
  • AsyncLLMEngine (class in furiosa_llm)

B

  • build() (furiosa_llm.artifact.ArtifactBuilder method)

C

  • chat() (furiosa_llm.LLM method)

E

  • embed() (furiosa_llm.LLM method)
  • encode() (furiosa_llm.AsyncLLMEngine method)
    • (furiosa_llm.LLM method)

F

  • from_engine_args() (furiosa_llm.AsyncLLMEngine class method)
    • (furiosa_llm.LLMEngine class method)

G

  • generate() (furiosa_llm.AsyncLLMEngine method)
    • (furiosa_llm.LLM method)

H

  • has_unfinished_requests() (furiosa_llm.LLMEngine method)

L

  • LLM (class in furiosa_llm)
  • LLMEngine (class in furiosa_llm)
  • load_artifact() (furiosa_llm.LLM class method)

M

  • model_config (furiosa_llm.artifact.Artifact attribute)
    • (furiosa_llm.artifact.ArtifactMetadata attribute)

N

  • normalize (furiosa_llm.PoolingParams attribute)

P

  • PoolingParams (class in furiosa_llm)

S

  • SamplingParams (class in furiosa_llm)
  • score() (furiosa_llm.LLM method)
  • shutdown() (furiosa_llm.LLM method)
  • step() (furiosa_llm.LLMEngine method)
  • stream_generate() (furiosa_llm.LLM method)

T

  • truncate_prompt_tokens (furiosa_llm.PoolingParams attribute)

By FuriosaAI, Inc.

© Copyright 2026 FuriosaAI Inc.