Skip to content

RTN

RTN Quantizer

RTN dataclass

RTN(name: str = None, num_layers: int = None, calc_quant_error: bool = False, include_layer_names: list[str] = None, exclude_layer_names: list[str] = (lambda: ['lm_head'])(), include_layer_keywords: list[str] = None, exclude_layer_keywords: list[str] = None, target_layer_types: tuple = (lambda: (Linear,))(), hessian_dtype: dtype = torch.float32, module_to_name: dict = dict(), results: dict = dict(), flag_calibration: bool = False, flag_hessian: bool = False, flag_xtx: bool = False, wbits: int = 4, groupsize: int = -1, sym: bool = False)

Bases: Quantizer

RTN (Round-To-Nearest) quantizer.

RTN is the simplest quantization method that rounds weights to the nearest quantization level. It does not require calibration data or Hessian matrices, performing quantization using only weight statistics.

Quantization method: - Computes minimum and maximum values of weights - Computes scale and zero point - Rounds weights to nearest quantization level (Round-To-Nearest)

RTN does not require calibration data or Hessian matrix. Fastest method but may have lower accuracy compared to other methods.

Attributes:

Name Type Description
flag_calibration bool

Whether to use calibration data (False for RTN).

flag_hessian bool

Whether to use Hessian matrix (False for RTN).

wbits int

Number of quantization bits. Default is 4.

groupsize int

Group size. Computes independent scale and zero point for each group. -1 means no grouping (single scale and zero point for entire row). Default is -1.

sym bool

Whether to use symmetric quantization. If True, zero point is placed at center. Default is False.

Methods:

Name Description
quantize_layer

Quantize a layer using RTN.

validate_params

validate_params()

Validate RTN parameters once in setup().

Validated ranges

wbits: int, 1 <= wbits <= 64 groupsize: int, -1 or >= 1 sym: bool (no constraint)

quantize_layer

quantize_layer(module, input=None, hessian=None)

Quantize a layer using RTN.

Parameters:

Name Type Description Default
module Module

The layer module to quantize.

required
input tuple or Tensor

Input tensor (not used in RTN). Default is None.

None
hessian Tensor

Hessian matrix (not used in RTN). Default is None.

None

Returns:

Name Type Description
RTNResult

RTN quantization result object containing quantized weights and parameters.

Raises:

Type Description
ValueError

If groupsize does not divide in_features.

RTNResult

RTNResult dataclass

RTNResult(dequantized_weight: Tensor = None, quantization_time: float = None, output_squared_error: float = None, mean_output_squared_error: float = None, weight_squared_error: float = None, mean_weight_squared_error: float = None, relative_output_squared_error: float = None, relative_weight_squared_error: float = None, wbits: int = None, groupsize: int = None, sym: bool = None, quantized_weight: Optional[Tensor] = None, scale: Optional[Tensor] = None, zero: Optional[Tensor] = None)

Bases: QuantizationResult

Result class for RTN quantization.

Inherits from QuantizationResult and adds RTN-specific parameters.

Attributes:

Name Type Description
dequantized_weight Tensor

Dequantized weights (FP16, CPU) - inherited from parent class.

wbits int

Number of quantization bits used.

groupsize int

Group size used (-1 means no grouping).

sym bool

Whether symmetric quantization was used.

quantized_weight Tensor

Quantized weights (INT type, CPU).

scale Tensor

Scale coefficients (FP16, CPU).

zero Tensor

Zero point (FP16, CPU).