OpenAI-Compatible API with Logprobs#
This example demonstrates how to retrieve log probabilities (logprobs) using the OpenAI-compatible Chat Completion API. There are two types of logprobs available:
Generated Token Logprobs (Standard OpenAI): Shows the model’s probability distribution over tokens during text generation.
Prompt Token Logprobs (vLLM Extension): Shows the model’s probability distribution over tokens for each position in the input prompt.
Chat Completion API Example#
The following example shows how to use both logprobs (for generated tokens)
and prompt_logprobs (for prompt tokens) in a single request.
"""
OpenAI-Compatible Server - Chat Completion API with Logprobs Example
This example demonstrates how to use both:
1. logprobs: Log probabilities for generated tokens (standard OpenAI parameter)
2. prompt_logprobs: Log probabilities for prompt tokens (vLLM extension)
Model: Any model available on the server
Endpoint: http://localhost:8000
"""
import os
from openai import OpenAI
base_url = os.getenv("OPENAI_BASE_URL", "http://localhost:8000/v1")
api_key = os.getenv("OPENAI_API_KEY", "EMPTY")
client = OpenAI(base_url=base_url, api_key=api_key)
# Get available model
models = client.models.list()
model_name = models.data[0].id
print(f"Using model: {model_name}")
print()
# Chat completion request with both logprobs and prompt_logprobs
response = client.chat.completions.create(
model=model_name,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
max_tokens=16,
temperature=0.0,
# Standard OpenAI parameter: logprobs for generated tokens
logprobs=True,
top_logprobs=3, # Return top 3 alternatives for each generated token
# vLLM/Furiosa-LLM extension parameters
extra_body={
"prompt_logprobs": 3, # Top 3 logprobs per prompt token
"return_token_ids": True, # Return prompt token IDs
},
)
# Print generated text
print("=" * 80)
print(f"Generated text: {response.choices[0].message.content}")
print("=" * 80)
print()
# =========================================================================
# Section 1: Generated Token Logprobs (Standard OpenAI)
# =========================================================================
print("=== Generated Token Logprobs (Standard OpenAI) ===")
print("These show the model's confidence for each token it generated.\n")
logprobs_content = response.choices[0].logprobs
if logprobs_content and logprobs_content.content:
for idx, token_info in enumerate(logprobs_content.content):
print(
f"Position {idx}: '{token_info.token}' (logprob={token_info.logprob:.6f})"
)
# Show alternative tokens the model considered
if token_info.top_logprobs:
for alt in token_info.top_logprobs:
marker = " <-- chosen" if alt.token == token_info.token else ""
print(f" '{alt.token}': logprob={alt.logprob:.6f}{marker}")
print()
else:
print("No logprobs in response")
print()
# =========================================================================
# Section 2: Prompt Token Logprobs (vLLM Extension)
# =========================================================================
print("=== Prompt Token Logprobs (vLLM Extension) ===")
print("These show how likely each prompt token was, given the preceding context.\n")
# Access vLLM extension fields directly from response object
prompt_logprobs = response.prompt_logprobs # type: ignore[attr-defined]
prompt_token_ids = response.prompt_token_ids # type: ignore[attr-defined]
if prompt_logprobs:
for idx, token_logprobs in enumerate(prompt_logprobs):
actual_token_id = prompt_token_ids[idx] if prompt_token_ids else None
if token_logprobs is None:
# First token has no logprobs (no prior context)
print(
f"Position {idx}: token_id={actual_token_id} "
f"-> None (first token, no prior context)"
)
continue
print(f"Position {idx}: token_id={actual_token_id}")
# token_logprobs is in dict[token_id, Logprob] format
for token_id_str, logprob_info in token_logprobs.items():
token_id = int(token_id_str)
logprob = logprob_info["logprob"]
rank = logprob_info.get("rank")
decoded_token = logprob_info.get("decoded_token", "")
escaped_token = repr(decoded_token)[1:-1]
is_actual = token_id == actual_token_id
actual_marker = " <-- actual" if is_actual else ""
print(
f" token_id={token_id:>6}, "
f"token='{escaped_token}', "
f"logprob={logprob:>10.6f}, "
f"rank={rank}{actual_marker}"
)
print()
else:
print("No prompt_logprobs in response (vLLM extension field)")
print()
Example Output#
Using model: Qwen/Qwen2.5-0.5B
================================================================================
Generated text: The capital of France is Paris.
================================================================================
=== Generated Token Logprobs (Standard OpenAI) ===
These show the model's confidence for each token it generated.
Position 0: 'The' (logprob=-0.298153)
'The': logprob=-0.298153 <-- chosen
'Paris': logprob=-1.551574
'As': logprob=-4.550769
Position 1: ' capital' (logprob=-0.003914)
' capital': logprob=-0.003914 <-- chosen
' current': logprob=-6.505387
' Capital': logprob=-8.506357
Position 2: ' of' (logprob=-0.011788)
' of': logprob=-0.011788 <-- chosen
' city': logprob=-4.769587
' and': logprob=-8.139359
Position 3: ' France' (logprob=-0.003914)
' France': logprob=-0.003914 <-- chosen
' the': logprob=-10.007491
'France': logprob=-11.007912
Position 4: ' is' (logprob=-0.003914)
' is': logprob=-0.003914 <-- chosen
',': logprob=-5.880869
' (': logprob=-9.756177
Position 5: ' Paris' (logprob=-0.007843)
' Paris': logprob=-0.007843 <-- chosen
' Lyon': logprob=-6.015181
'巴黎': logprob=-6.947220
...
=== Prompt Token Logprobs (vLLM Extension) ===
These show how likely each prompt token was, given the preceding context.
Position 0: token_id=151644 -> None (first token, no prior context)
Position 1: token_id=8948
token_id= 8948, token='system', logprob=-11.958041, rank=8063 <-- actual
token_id= 72030, token='/API', logprob= -1.378512, rank=1
token_id= 16731, token='/T', logprob= -3.065493, rank=2
token_id= 59981, token='/block', logprob= -3.948318, rank=3
Position 2: token_id=198
token_id= 198, token='\n', logprob= -1.947865, rank=1 <-- actual
token_id= 271, token='\n\n', logprob= -2.263327, rank=2
token_id= 69425, token=' 发', logprob= -2.448469, rank=3
...
Key Parameters#
Generated Token Logprobs (Standard OpenAI)#
logprobs(bool): When True, returns log probabilities for generated tokens.top_logprobs(int): Number of top alternative tokens to return.
These parameters are part of the standard OpenAI Chat Completion API and show what alternatives the model considered when generating each token.
Prompt Token Logprobs (vLLM Extension)#
prompt_logprobs(int): Number of top log probabilities to return for each prompt token. Pass viaextra_body.return_token_ids(bool): When True, returns the token IDs of the prompt tokens. Pass viaextra_body.
These parameters are vLLM extensions and show how likely each token in your prompt was, given the preceding context.
Response Structure#
Generated Token Logprobs#
Located in the standard response structure:
response.choices[0].logprobs.content[i].token # The chosen token
response.choices[0].logprobs.content[i].logprob # Its log probability
response.choices[0].logprobs.content[i].top_logprobs # Alternative tokens
Prompt Token Logprobs#
Located in the vLLM extension fields, accessible directly on the response object:
response.prompt_logprobs # List of logprobs per prompt position
response.prompt_token_ids # List of actual token IDs in the prompt
Each logprob entry contains:
logprob: The log probability valuerank: The rank of this token among all possibilities (1 = most likely)decoded_token: The string representation of the token
Understanding the Output#
Generated Token Logprobs#
For generated tokens, a logprob closer to 0 indicates higher confidence:
logprob=-0.003914: \(e^{-0.003914}\) ≈ 99.6% probability (very confident)logprob=-2.0≈ 13.5% probability (less confident)logprob=-5.0≈ 0.7% probability (unlikely alternative)
Prompt Token Logprobs#
For prompt tokens, the rank field shows where the actual token ranked among
all possible tokens. A high rank (e.g., 8063) with a low logprob indicates that
the model found that token “surprising” given the context - this is expected for prompt
tokens since they are user-provided.