Skip to content

OneBit

OneBit Quantizer

Onebit dataclass

Onebit(name: str = None, num_layers: int = None, calc_quant_error: bool = False, include_layer_names: list[str] = None, exclude_layer_names: list[str] = (lambda: ['lm_head'])(), include_layer_keywords: list[str] = None, exclude_layer_keywords: list[str] = (lambda: ['per_layer_model_projection'])(), target_layer_types: tuple = (lambda: (Linear,))(), hessian_dtype: dtype = torch.float32, module_to_name: dict = dict(), results: dict = dict(), flag_calibration: bool = True, flag_hessian: bool = True, flag_xtx: bool = False, iters: int = 10, use_importance_scaling: bool = True, use_balancing: bool = True, balance_iters: int = 40, balance_alpha: float = 1.0)

Bases: Quantizer

OneBit quantizer.

Runs OneBit quantization per layer.

Attributes:

Name Type Description
iters int

Optimization iterations.

use_importance_scaling bool

Whether to use importance scaling.

use_balancing bool

Whether to apply weight balancing.

balance_iters int

Balancing iterations.

balance_alpha float

Balancing alpha.

Methods:

Name Description
quantize_layer

Quantizes a given layer and returns OnebitResult.

validate_params

validate_params()

Validate OneBit parameters once in setup().

Validated ranges

iters: int >= 0 balance_iters: int >= 1 (when use_balancing=True) balance_alpha: float > 0 (when use_balancing=True)

quantize_layer

quantize_layer(module, input=None, hessian=None)

Quantize the layer.

Parameters:

Name Type Description Default
module Module

The layer module.

required
input tuple

The input to the layer (not used).

None
hessian Tensor

The Hessian matrix.

None

Returns:

Name Type Description
OnebitResult

OneBit quantization result object containing quantized weights and parameters.

OnebitResult

OnebitResult dataclass

OnebitResult(dequantized_weight: Tensor = None, quantization_time: float = None, output_squared_error: float = None, mean_output_squared_error: float = None, weight_squared_error: float = None, mean_weight_squared_error: float = None, relative_output_squared_error: float = None, relative_weight_squared_error: float = None, iters: int = None, use_importance_scaling: bool = None, use_balancing: bool = None, balance_iters: int = None, balance_alpha: float = None, a: Optional[Tensor] = None, b: Optional[Tensor] = None, sign: Optional[Tensor] = None)

Bases: QuantizationResult

OneBit quantization result.

Attributes:

Name Type Description
dequantized_weight Tensor

Dequantized weight (FP16, CPU).

iters int

Optimization iterations.

use_importance_scaling bool

Whether to use importance scaling.

use_balancing bool

Whether to apply weight balancing.

balance_iters int

Balancing iterations.

balance_alpha float

Balancing alpha.

a Optional[Tensor]

Scaling vector a.

b Optional[Tensor]

Scaling vector b.

sign Optional[Tensor]

Sign matrix sign(W).