QuantizedModelLoader¶
Loader for quantized models saved by OneComp.
QuantizedModelLoader ¶
Loader for quantized models saved by onecomp (GPTQ, DBF, etc.).
load_quantized_model
classmethod
¶
load_quantized_model(save_directory: str, *, torch_dtype: Optional[dtype] = None, device_map: str = 'auto', trust_remote_code: bool = True, local_files_only: bool = True) -> Tuple[Any, Any]
Load a quantized model and tokenizer from a safetensors directory.
The directory must contain: - config.json (with quantization_config) - tokenizer files - model.safetensors (quantized layers: qweight/scales for GPTQ, scaling0/bp for DBF)
Quantization parameters (quant_method, bits, group_size, etc.) are read from config.json and quantized layers are reconstructed directly from the safetensors state_dict. No quantization_results.pt is needed.
For models saved with post-processing modifications (e.g. LoRA adapters),
use :meth:load_quantized_model_pt instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_directory
|
str
|
Path to the saved model directory. |
required |
torch_dtype
|
Optional[dtype]
|
Model dtype (default: torch.float16). |
None
|
device_map
|
str
|
Device placement (default: "auto"). |
'auto'
|
trust_remote_code
|
bool
|
Passed to from_pretrained. |
True
|
local_files_only
|
bool
|
Passed to from_pretrained. |
True
|
Returns:
| Type | Description |
|---|---|
Tuple[Any, Any]
|
(model, tokenizer) |
Example
model, tokenizer = QuantizedModelLoader.load_quantized_model("./tinyllama_gptq3")
load_quantized_model_pt
classmethod
¶
load_quantized_model_pt(save_directory: str, *, device_map: str = 'auto', local_files_only: bool = True) -> Tuple[Any, Any]
Load a quantized model and tokenizer saved as a PyTorch .pt file.
Use this method to load models saved by
:meth:Runner.save_quantized_model_pt, which preserves custom
module types (e.g. LoRAGPTQLinear from LoRA post-processing).
The directory must contain:
- model.pt (serialized with torch.save)
- Tokenizer files
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_directory
|
str
|
Path to the saved model directory. |
required |
device_map
|
str
|
Device placement (default: |
'auto'
|
local_files_only
|
bool
|
Passed to |
True
|
Returns:
| Type | Description |
|---|---|
Tuple[Any, Any]
|
(model, tokenizer) |
Example
model, tokenizer = QuantizedModelLoader.load_quantized_model_pt( ... "./quantized_model_lora" ... )
Convenience Functions¶
The top-level aliases provide shortcuts for both formats:
from onecomp import load_quantized_model, load_quantized_model_pt
# Load a safetensors model (standard quantized, no LoRA)
model, tokenizer = load_quantized_model("./saved_model")
# Load a PyTorch .pt model (post-processed, e.g. LoRA-applied)
model, tokenizer = load_quantized_model_pt("./saved_model_lora")