SamplingParams class

SamplingParams class#

class furiosa_llm.SamplingParams(*, n: int = 1, best_of: int | None = None, repetition_penalty: float = 1.0, temperature: float = 1.0, top_p: float = 1.0, top_k: int = -1, min_p: float = 0.0, use_beam_search: bool = False, length_penalty: float = 1.0, early_stopping: bool | str = False, max_tokens: int | None = 16, min_tokens: int = 0, logprobs: int | None = None, output_kind: RequestOutputKind = RequestOutputKind.CUMULATIVE)[source]#

Bases: object

Sampling parameters for text generation.

Parameters:

n – Number of output sequences to return for the given prompt.
best_of – Number of output sequences that are generated from the prompt. From these best_of sequences, the top n sequences are returned. best_of must be greater than or equal to n. This is treated as the beam width when use_beam_search is True. By default, best_of is set to n.
repetition_penalty – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens.
temperature – Float that controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling.
top_p – Float that controls the cumulative probability of the top tokens to consider. Must be in (0, 1]. Set to 1 to consider all tokens.
top_k – Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens.
min_p – Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.
use_beam_search – Whether to use beam search instead of sampling.
length_penalty – Float that penalizes sequences based on their length. Used in beam search.
early_stopping – Controls the stopping condition for beam search. It accepts the following values: True, where the generation stops as soon as there are best_of complete candidates; False, where an heuristic is applied and the generation stops when is it very unlikely to find better candidates; “never”, where the beam search procedure only stops when there cannot be better candidates (canonical beam search algorithm).
max_tokens – Maximum number of tokens to generate per output sequence. If the value is None, it is capped to the maximum sequence length.
min_tokens – Minimum number of tokens to generate per output sequence before EOS or stop_token_ids can be generated
logprobs – Number of log probabilities to return per output token. When set to None, no probability is returned. If set to a non-None value, the result includes the log probabilities of the specified number of most likely tokens, as well as the chosen tokens. Note that the implementation follows the OpenAI API: The API will always return the log probability of the sampled token, so there may be up to logprobs+1 elements in the response.

Examples#

This section provides examples of how to use the token generation methods available in the SDK.

1. Basic Greedy Search#

SamplingParams(min_tokens=10, max_tokens=100)

The Basic Greedy Search method generates a sequence of tokens, ensuring that at least min_tokens and up to max_tokens are produced.

Parameters:
- min_tokens: Minimum number of tokens to generate.
- max_tokens: Maximum number of tokens to generate.
Behavior:
- Generation may terminate before reaching max_tokens if an End Of Sequence (EOS) token is generated.
- The EOS token will not be generated before reaching the specified min_tokens.

2. Random Sampling with `top_p` / `top_k` Parameters#

SamplingParams(min_tokens=10, max_tokens=100, top_p=0.3, top_k=100)

This method uses random sampling techniques for token generation, allowing for diverse outputs.

Parameters:
- min_tokens: Minimum number of tokens to generate.
- max_tokens: Maximum number of tokens to generate.
- top_p: Cumulative probability for nucleus sampling.
- top_k: Number of highest probability tokens to consider.
Behavior:
- Each generation may yield different results, even with the same input text and parameters, enhancing variability.
- Generation may terminate before reaching max_tokens if an End Of Sequence (EOS) token is generated.
- The EOS token will not be generated before reaching the specified min_tokens.

3. Beam Search with `best_of` Beams#

SamplingParams(min_tokens=10, max_tokens=100, use_beam_search=True, best_of=4)

Beam Search enhances the generation process by exploring multiple sequences simultaneously.

Parameters:
- min_tokens: Minimum number of tokens to generate.
- max_tokens: Maximum number of tokens to generate.
- use_beam_search: Must be set to True to enable beam search.
- best_of: Number of beams to consider for generating the best output.
Behavior:
- The generation process explores multiple possible sequences to determine the best output.
- Generation may terminate before reaching max_tokens if the number of End Of Sequence (EOS) tokens generated across all beams reaches the best_of count.
- The EOS token will not be generated before reaching the specified min_tokens.