Skip to content

QuantizedModelLoader

Loader for quantized models saved by OneComp.

QuantizedModelLoader

Loader for quantized models saved by onecomp (GPTQ, DBF, etc.).

load_quantized_model classmethod

load_quantized_model(save_directory: str, *, torch_dtype: Optional[dtype] = None, device_map: str = 'auto', trust_remote_code: bool = True, local_files_only: bool = True) -> Tuple[Any, Any]

Load a quantized model and tokenizer from a safetensors directory.

The directory must contain: - config.json (with quantization_config) - tokenizer files - model.safetensors (quantized layers: qweight/scales for GPTQ, scaling0/bp for DBF)

Quantization parameters (quant_method, bits, group_size, etc.) are read from config.json and quantized layers are reconstructed directly from the safetensors state_dict. No quantization_results.pt is needed.

For models saved with post-processing modifications (e.g. LoRA adapters), use :meth:load_quantized_model_pt instead.

Parameters:

Name Type Description Default
save_directory str

Path to the saved model directory.

required
torch_dtype Optional[dtype]

Model dtype (default: torch.float16).

None
device_map str

Device placement (default: "auto").

'auto'
trust_remote_code bool

Passed to from_pretrained.

True
local_files_only bool

Passed to from_pretrained.

True

Returns:

Type Description
Tuple[Any, Any]

(model, tokenizer)

Example

model, tokenizer = QuantizedModelLoader.load_quantized_model("./tinyllama_gptq3")

load_quantized_model_pt classmethod

load_quantized_model_pt(save_directory: str, *, device_map: str = 'auto', local_files_only: bool = True) -> Tuple[Any, Any]

Load a quantized model and tokenizer saved as a PyTorch .pt file.

Use this method to load models saved by :meth:Runner.save_quantized_model_pt, which preserves custom module types (e.g. LoRAGPTQLinear from LoRA post-processing).

The directory must contain: - model.pt (serialized with torch.save) - Tokenizer files

Parameters:

Name Type Description Default
save_directory str

Path to the saved model directory.

required
device_map str

Device placement (default: "auto"). Set to "" or None to skip device placement.

'auto'
local_files_only bool

Passed to AutoTokenizer.from_pretrained.

True

Returns:

Type Description
Tuple[Any, Any]

(model, tokenizer)

Example

model, tokenizer = QuantizedModelLoader.load_quantized_model_pt( ... "./quantized_model_lora" ... )

Convenience Functions

The top-level aliases provide shortcuts for both formats:

from onecomp import load_quantized_model, load_quantized_model_pt

# Load a safetensors model (standard quantized, no LoRA)
model, tokenizer = load_quantized_model("./saved_model")

# Load a PyTorch .pt model (post-processed, e.g. LoRA-applied)
model, tokenizer = load_quantized_model_pt("./saved_model_lora")