QuantizedModelLoader¶

Loader for quantized models saved by OneComp.

QuantizedModelLoader ¶

Loader for quantized models saved by onecomp (GPTQ, DBF, etc.).

load_quantized_model `classmethod` ¶

load_quantized_model(save_directory: str, *, torch_dtype: Optional[dtype] = None, device_map: str = 'auto', trust_remote_code: bool = True, local_files_only: bool = True) -> Tuple[Any, Any]

Load a quantized model and tokenizer from a safetensors directory.

The directory must contain: - config.json (with quantization_config) - tokenizer files - model.safetensors (quantized layers: qweight/scales for GPTQ, scaling0/bp for DBF)

Quantization parameters (quant_method, bits, group_size, etc.) are read from config.json and quantized layers are reconstructed directly from the safetensors state_dict. No quantization_results.pt is needed.

For models saved with post-processing modifications (e.g. LoRA adapters), use :meth:load_quantized_model_pt instead.

Parameters:

Name	Type	Description	Default
`save_directory`	`str`	Path to the saved model directory.	required
`torch_dtype`	`Optional[dtype]`	Model dtype (default: torch.float16).	`None`
`device_map`	`str`	Device placement (default: "auto").	`'auto'`
`trust_remote_code`	`bool`	Passed to from_pretrained.	`True`
`local_files_only`	`bool`	Passed to from_pretrained.	`True`

Returns:

Type	Description
`Tuple[Any, Any]`	(model, tokenizer)

Example

model, tokenizer = QuantizedModelLoader.load_quantized_model("./tinyllama_gptq3")

load_quantized_model_pt `classmethod` ¶

load_quantized_model_pt(save_directory: str, *, device_map: str = 'auto', local_files_only: bool = True) -> Tuple[Any, Any]

Load a quantized model and tokenizer saved as a PyTorch .pt file.

Use this method to load models saved by :meth:Runner.save_quantized_model_pt, which preserves custom module types (e.g. LoRAGPTQLinear from LoRA post-processing).

The directory must contain: - model.pt (serialized with torch.save) - Tokenizer files

Parameters:

Name	Type	Description	Default
`save_directory`	`str`	Path to the saved model directory.	required
`device_map`	`str`	Device placement (default: `"auto"`). Set to `""` or `None` to skip device placement.	`'auto'`
`local_files_only`	`bool`	Passed to `AutoTokenizer.from_pretrained`.	`True`

Returns:

Type	Description
`Tuple[Any, Any]`	(model, tokenizer)

Example

model, tokenizer = QuantizedModelLoader.load_quantized_model_pt( ... "./quantized_model_lora" ... )

Convenience Functions¶

The top-level aliases provide shortcuts for both formats:

from onecomp import load_quantized_model, load_quantized_model_pt

# Load a safetensors model (standard quantized, no LoRA)
model, tokenizer = load_quantized_model("./saved_model")

# Load a PyTorch .pt model (post-processed, e.g. LoRA-applied)
model, tokenizer = load_quantized_model_pt("./saved_model_lora")

QuantizedModelLoader¶