RTN¶

RTN Quantizer¶

RTN `dataclass` ¶

RTN(name: str = None, num_layers: int = None, calc_quant_error: bool = False, include_layer_names: list[str] = None, exclude_layer_names: list[str] = (lambda: ['lm_head'])(), include_layer_keywords: list[str] = None, exclude_layer_keywords: list[str] = None, target_layer_types: tuple = (lambda: (Linear,))(), hessian_dtype: dtype = torch.float32, module_to_name: dict = dict(), results: dict = dict(), flag_calibration: bool = False, flag_hessian: bool = False, flag_xtx: bool = False, wbits: int = 4, groupsize: int = -1, sym: bool = False)

Bases: Quantizer

RTN (Round-To-Nearest) quantizer.

RTN is the simplest quantization method that rounds weights to the nearest quantization level. It does not require calibration data or Hessian matrices, performing quantization using only weight statistics.

Quantization method: - Computes minimum and maximum values of weights - Computes scale and zero point - Rounds weights to nearest quantization level (Round-To-Nearest)

RTN does not require calibration data or Hessian matrix. Fastest method but may have lower accuracy compared to other methods.

Attributes:

Name	Type	Description
`flag_calibration`	`bool`	Whether to use calibration data (False for RTN).
`flag_hessian`	`bool`	Whether to use Hessian matrix (False for RTN).
`wbits`	`int`	Number of quantization bits. Default is 4.
`groupsize`	`int`	Group size. Computes independent scale and zero point for each group. -1 means no grouping (single scale and zero point for entire row). Default is -1.
`sym`	`bool`	Whether to use symmetric quantization. If True, zero point is placed at center. Default is False.

Methods:

Name	Description
`quantize_layer`	Quantize a layer using RTN.

validate_params ¶

validate_params()

Validate RTN parameters once in setup().

Validated ranges

wbits: int, 1 <= wbits <= 64 groupsize: int, -1 or >= 1 sym: bool (no constraint)

quantize_layer ¶

quantize_layer(module, input=None, hessian=None)

Quantize a layer using RTN.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The layer module to quantize.	required
`input`	`tuple or Tensor`	Input tensor (not used in RTN). Default is None.	`None`
`hessian`	`Tensor`	Hessian matrix (not used in RTN). Default is None.	`None`

Returns:

Name	Type	Description
`RTNResult`		RTN quantization result object containing quantized weights and parameters.

Raises:

Type	Description
`ValueError`	If groupsize does not divide in_features.

RTNResult¶

RTNResult `dataclass` ¶

RTNResult(dequantized_weight: Tensor = None, quantization_time: float = None, output_squared_error: float = None, mean_output_squared_error: float = None, weight_squared_error: float = None, mean_weight_squared_error: float = None, relative_output_squared_error: float = None, relative_weight_squared_error: float = None, wbits: int = None, groupsize: int = None, sym: bool = None, quantized_weight: Optional[Tensor] = None, scale: Optional[Tensor] = None, zero: Optional[Tensor] = None)

Bases: QuantizationResult

Result class for RTN quantization.

Inherits from QuantizationResult and adds RTN-specific parameters.