ArtifactBuilder

ArtifactBuilder#

class furiosa_llm.artifact.ArtifactBuilder(model_id_or_path: str | Path, name: str = '', *, model_config: ModelConfig | None = None, parallel_config: ParallelConfig | None = None, bucket_config: BucketConfig | None = None, compiler_config: CompilerConfig | None = None, speculative_config: SpeculativeDecodingConfig | None = None, artifact_config: ArtifactConfig | None = None, tensor_parallel_size: int | None = None, pipeline_parallel_size: int | None = None, max_seq_len_to_capture: int | None = None)[source]#

Bases: object

The artifact builder to use in the Furiosa LLM.

Parameters:

model_id_or_path – The HuggingFace model id or a local path. This corresponds to pretrained_model_name_or_path in HuggingFace Transformers.
name – The name of the artifact to build. If not provided, defaults to model_id_or_path.
model_config – Configuration for model HuggingFace settings (trust_remote_code, etc.).
parallel_config – Configuration for parallelization (tensor and pipeline parallelism). Defaults to tensor_parallel_size=8, pipeline_parallel_size=1.
bucket_config – Configuration for attention buckets and sequence lengths. Defaults to max_seq_len_to_capture=2048 with auto-generated buckets.
compiler_config – Configuration for compiler and model rewriting options.
speculative_config – Configuration for speculative decoding (experimental).
artifact_config – Configuration for artifact export options.
tensor_parallel_size – Deprecated. Use parallel_config instead.
pipeline_parallel_size – Deprecated. Use parallel_config instead.
max_seq_len_to_capture – Deprecated. Use bucket_config instead.

build(save_dir: str | PathLike, *, num_pipeline_builder_workers: int = 1, num_compile_workers: int = 1, num_cpu_per_pipeline_build_worker: int = 1, num_cpu_per_compile_worker: int = 1, cache_dir: PathLike | None = PosixPath('/root/.cache/furiosa/llm'), param_file_path: PathLike | None = None, param_saved_format: Literal['safetensors', 'pt'] = 'safetensors', param_file_max_shard_size: str | int | None = '5GB', _cleanup: bool = True, _raise_error_if_compile: bool = False, **kwargs)[source]#

Build the artifacts for given model configurations.

Parameters:

save_dir – The path to save the artifacts. With artifacts, you can create LLM without quantizing or compiling the model again.
num_pipeline_builder_workers – The number of workers used for building pipelines (except for compilation). The default is 1 (no parallelism). Setting this value larger than 1 reduces pipeline building time, especially for large models, but requires much more memory.
num_compile_workers – The number of workers used for compilation. The default is 1 (no parallelism).
num_cpu_per_pipeline_build_worker – The number of cpu cores allocated for each pipeline build worker. The default is 1.
num_cpu_per_compile_worker – The number of cpu cores allocated for each compile worker. The default is 1.
cache_dir – The cache directory for all generated files for this LLM instance. When its value is None, caching is disabled. The default is “$HOME/.cache/furiosa/llm”.
param_file_path – The path to the parameter file to use for pipeline generation. If not specified, the parameters will be saved in a temporary file which will be deleted when LLM is destroyed.
param_saved_format – The format of the parameter file. Only possible value is “safetensors” now. The default is “safetensors”.
param_file_max_shard_size – The maximum size of single parameter file. Parameter file will be split into smaller files to be less than this size. The default is “5GB”.

Artifact#

class furiosa_llm.artifact.Artifact(*, metadata: ArtifactMetadata, model: ModelArtifact, speculative_model: ModelArtifact | None = None, version: SchemaVersion, prefill_chunk_size: int | None = None)[source]#

Bases: ArtifactBase

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

ArtifactMetadata#

class furiosa_llm.artifact.ArtifactMetadata(*, artifact_id: str, name: str, timestamp: int, furiosa_llm_version: str, furiosa_compiler_version: str, includes_composable_ir: bool)[source]#

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].