Skip to main content
Ctrl+K

Furiosa Docs

Overview

  • FuriosaAI RNGD
  • FuriosaAI’s Software Stack
  • Supported Models
  • What’s New
    • Furiosa SDK Release 2026.1
    • Release Notes for Furiosa SDK Release 2025.X
  • Roadmap

Get Started

  • Installing Prerequisites
  • Quick Start with Furiosa-LLM
  • Upgrading FuriosaAI’s Software

Furiosa-LLM

  • Furiosa-LLM
  • OpenAI-Compatible Server
  • Tool Calling
  • Structured Output
  • Prefix Caching
  • Hybrid KV Cache Management
  • Model Preparation
  • Model Parallelism
  • API Reference
    • LLM class
    • SamplingParams class
    • PoolingParams class
    • ArtifactBuilder
    • LLMEngine class
    • AsyncLLMEngine class
  • Examples
    • Chat
    • Embedding
    • Scoring (Similarity Scoring)
    • Reranking (Document Reranking)
    • OpenAI-Compatible API with Logprobs
  • Deploying Furiosa-LLM on Kubernetes

Cloud Native Toolkit

  • Cloud Native Toolkit
  • Container Support
  • Kubernetes Plugins
    • Installing Furiosa Feature Discovery
    • Installing Furiosa Device Plugin
    • Installing Furiosa DRA Driver
    • Installing Furiosa Metrics Exporter
    • Installing Furiosa NPU Operator
  • Deploying Furiosa-LLM with llm-d

Device Management

  • Furiosa SMI
    • Furiosa SMI CLI
    • Furiosa SMI Library
  • Host PCI Optimization Tuning

Tutorials and Examples

  • FuriosaAI SDK CookBook

Customer Support

  • Forums
  • Customer Support

Other Links

  • FuriosaAI Homepage
  • Furiosa Gen 1 NPU SDK Doc

© Copyright 2026 FuriosaAI Inc.

Examples

Examples#

  • Chat
  • Embedding
  • Scoring (Similarity Scoring)
  • Reranking (Document Reranking)
  • OpenAI-Compatible API with Logprobs

previous

AsyncLLMEngine class

next

Chat

By FuriosaAI, Inc.

© Copyright 2026 FuriosaAI Inc.