Qwen3-Embedding#
The Qwen3-Embedding series is a family of text-embedding models built on the Qwen3 dense transformer backbone. They map text into dense vector representations for semantic search, retrieval, and similarity matching, with strong multilingual coverage.
FuriosaAI publishes pre-compiled builds of the Qwen3-Embedding models under the
furiosa-ai organization on the Hugging Face Hub,
each shipping a Furiosa Executable Bundle (FXB) for running it on
FuriosaAI RNGD with Furiosa-LLM.
For the related reranking model see Qwen3-Reranker; for the dense Qwen3 chat models see Qwen3 (dense).
Variants#
Model |
Quantization |
RNGD cards |
Notes |
|---|---|---|---|
None (16-bit) |
1 |
8B text embedding |
Architecture: Qwen3 (dense),
Qwen3ModelTask: Embedding
Input / Output: Text / Embeddings (vector)
Quantization: No quantization — the model runs in its native 16-bit precision.
Usage#
To run this model with Furiosa-LLM, follow the examples below after installing Furiosa-LLM and its prerequisites. You can use the model either offline through the Furiosa-LLM Python API or online through the OpenAI-compatible server.
Python API#
Load the artifact and call embed to obtain dense vectors:
from furiosa_llm import LLM
llm = LLM.from_artifacts("furiosa-ai/Qwen3-Embedding-8B")
embeddings = llm.embed(["Hello, world!", "How are you?"])
Online server#
The server exposes an OpenAI-compatible /v1/embeddings endpoint. Launch it the
same way as any other model:
# Launch the server, listening on port 8000 by default
furiosa-llm serve furiosa-ai/Qwen3-Embedding-8B
Once it is ready, request embeddings with curl:
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "furiosa-ai/Qwen3-Embedding-8B",
"input": ["Hello, world!", "How are you?"]
}' \
| python -m json.tool
Because the endpoint is OpenAI-compatible, you can also use the OpenAI Python client:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
response = client.embeddings.create(
model="furiosa-ai/Qwen3-Embedding-8B",
input=["Hello, world!", "How are you?"],
)
for data in response.data:
print(f"Index {data.index}: {len(data.embedding)} dimensions")
Learn more#
Furiosa-LLM Server (
furiosa-llm serve) — full OpenAI-compatible API reference, including the Embeddings APIFuriosa-LLM — Furiosa-LLM documentation and API reference
Upstream model card: Qwen/Qwen3-Embedding-8B