SamplingParams class#
- class furiosa_llm.SamplingParams(*, n: int = 1, best_of: int | None = None, repetition_penalty: float = 1.0, temperature: float = 1.0, top_p: float = 1.0, top_k: int = -1, min_p: float = 0.0, use_beam_search: bool = False, length_penalty: float = 1.0, early_stopping: bool | str = False, max_tokens: int | None = 16, min_tokens: int = 0, output_kind: RequestOutputKind = RequestOutputKind.CUMULATIVE)[source]#
Bases:
object
Sampling parameters for text generation.
- Parameters:
n – Number of output sequences to return for the given prompt.
best_of – Number of output sequences that are generated from the prompt. From these best_of sequences, the top n sequences are returned. best_of must be greater than or equal to n. This is treated as the beam width when use_beam_search is True. By default, best_of is set to n.
repetition_penalty – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens.
temperature – Float that controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling.
top_p – Float that controls the cumulative probability of the top tokens to consider. Must be in (0, 1]. Set to 1 to consider all tokens.
top_k – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens.
min_p – Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.
use_beam_search – Whether to use beam search instead of sampling.
length_penalty – Float that penalizes sequences based on their length. Used in beam search.
early_stopping – Controls the stopping condition for beam search. It accepts the following values: True, where the generation stops as soon as there are best_of complete candidates; False, where an heuristic is applied and the generation stops when is it very unlikely to find better candidates; “never”, where the beam search procedure only stops when there cannot be better candidates (canonical beam search algorithm).
max_tokens – Maximum number of tokens to generate per output sequence. If the value is None, it is capped to the maximum sequence length.
min_tokens – Minimum number of tokens to generate per output sequence before EOS or stop_token_ids can be generated
Examples#
This section provides examples of how to use the token generation methods available in the SDK.
1. Basic Greedy Search#
SamplingParams(min_tokens=10, max_tokens=100)
The Basic Greedy Search method generates a sequence of tokens, ensuring that at least min_tokens
and up to max_tokens
are produced.
Parameters:
min_tokens
: Minimum number of tokens to generate.max_tokens
: Maximum number of tokens to generate.
Behavior:
Generation may terminate before reaching
max_tokens
if an End Of Sequence (EOS) token is generated.The EOS token will not be generated before reaching the specified
min_tokens
.
2. Random Sampling with top_p
/ top_k
Parameters#
SamplingParams(min_tokens=10, max_tokens=100, top_p=0.3, top_k=100)
This method uses random sampling techniques for token generation, allowing for diverse outputs.
Parameters:
min_tokens
: Minimum number of tokens to generate.max_tokens
: Maximum number of tokens to generate.top_p
: Cumulative probability for nucleus sampling.top_k
: Number of highest probability tokens to consider.
Behavior:
Each generation may yield different results, even with the same input text and parameters, enhancing variability.
Generation may terminate before reaching
max_tokens
if an End Of Sequence (EOS) token is generated.The EOS token will not be generated before reaching the specified
min_tokens
.
3. Beam Search with best_of
Beams#
SamplingParams(min_tokens=10, max_tokens=100, use_beam_search=True, best_of=4)
Beam Search enhances the generation process by exploring multiple sequences simultaneously.
Parameters:
min_tokens
: Minimum number of tokens to generate.max_tokens
: Maximum number of tokens to generate.use_beam_search
: Must be set to True to enable beam search.best_of
: Number of beams to consider for generating the best output.
Behavior:
The generation process explores multiple possible sequences to determine the best output.
Generation may terminate before reaching
max_tokens
if the number of End Of Sequence (EOS) tokens generated across all beams reaches thebest_of
count.The EOS token will not be generated before reaching the specified
min_tokens
.