OneBit¶
OneBit Quantizer¶
Onebit
dataclass
¶
Onebit(name: str = None, num_layers: int = None, calc_quant_error: bool = False, include_layer_names: list[str] = None, exclude_layer_names: list[str] = (lambda: ['lm_head'])(), include_layer_keywords: list[str] = None, exclude_layer_keywords: list[str] = (lambda: ['per_layer_model_projection'])(), target_layer_types: tuple = (lambda: (Linear,))(), hessian_dtype: dtype = torch.float32, module_to_name: dict = dict(), results: dict = dict(), flag_calibration: bool = True, flag_hessian: bool = True, flag_xtx: bool = False, iters: int = 10, use_importance_scaling: bool = True, use_balancing: bool = True, balance_iters: int = 40, balance_alpha: float = 1.0)
Bases: Quantizer
OneBit quantizer.
Runs OneBit quantization per layer.
Attributes:
| Name | Type | Description |
|---|---|---|
iters |
int
|
Optimization iterations. |
use_importance_scaling |
bool
|
Whether to use importance scaling. |
use_balancing |
bool
|
Whether to apply weight balancing. |
balance_iters |
int
|
Balancing iterations. |
balance_alpha |
float
|
Balancing alpha. |
Methods:
| Name | Description |
|---|---|
quantize_layer |
Quantizes a given layer and returns OnebitResult. |
validate_params ¶
Validate OneBit parameters once in setup().
Validated ranges
iters: int >= 0 balance_iters: int >= 1 (when use_balancing=True) balance_alpha: float > 0 (when use_balancing=True)
quantize_layer ¶
Quantize the layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module
|
Module
|
The layer module. |
required |
input
|
tuple
|
The input to the layer (not used). |
None
|
hessian
|
Tensor
|
The Hessian matrix. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
OnebitResult |
OneBit quantization result object containing quantized weights and parameters. |
OnebitResult¶
OnebitResult
dataclass
¶
OnebitResult(dequantized_weight: Tensor = None, quantization_time: float = None, output_squared_error: float = None, mean_output_squared_error: float = None, weight_squared_error: float = None, mean_weight_squared_error: float = None, relative_output_squared_error: float = None, relative_weight_squared_error: float = None, iters: int = None, use_importance_scaling: bool = None, use_balancing: bool = None, balance_iters: int = None, balance_alpha: float = None, a: Optional[Tensor] = None, b: Optional[Tensor] = None, sign: Optional[Tensor] = None)
Bases: QuantizationResult
OneBit quantization result.
Attributes:
| Name | Type | Description |
|---|---|---|
dequantized_weight |
Tensor
|
Dequantized weight (FP16, CPU). |
iters |
int
|
Optimization iterations. |
use_importance_scaling |
bool
|
Whether to use importance scaling. |
use_balancing |
bool
|
Whether to apply weight balancing. |
balance_iters |
int
|
Balancing iterations. |
balance_alpha |
float
|
Balancing alpha. |
a |
Optional[Tensor]
|
Scaling vector a. |
b |
Optional[Tensor]
|
Scaling vector b. |
sign |
Optional[Tensor]
|
Sign matrix sign(W). |