RTN¶
RTN Quantizer¶
RTN
dataclass
¶
RTN(name: str = None, num_layers: int = None, calc_quant_error: bool = False, include_layer_names: list[str] = None, exclude_layer_names: list[str] = (lambda: ['lm_head'])(), include_layer_keywords: list[str] = None, exclude_layer_keywords: list[str] = (lambda: ['per_layer_model_projection'])(), target_layer_types: tuple = (lambda: (Linear,))(), hessian_dtype: dtype = torch.float32, module_to_name: dict = dict(), results: dict = dict(), flag_calibration: bool = False, flag_hessian: bool = False, flag_xtx: bool = False, wbits: int = 4, groupsize: int = -1, sym: bool = False, mse: bool = False, norm: float = 2.4, grid: int = 100)
Bases: Quantizer
RTN (Round-To-Nearest) quantizer.
RTN is the simplest quantization method that rounds weights to the nearest quantization level. It does not require calibration data or Hessian matrices, performing quantization using only weight statistics.
Quantization method: - Computes minimum and maximum values of weights - Computes scale and zero point - Rounds weights to nearest quantization level (Round-To-Nearest)
RTN does not require calibration data or Hessian matrix. Fastest method but may have lower accuracy compared to other methods.
Attributes:
| Name | Type | Description |
|---|---|---|
flag_calibration |
bool
|
Whether to use calibration data (False for RTN). |
flag_hessian |
bool
|
Whether to use Hessian matrix (False for RTN). |
wbits |
int
|
Number of quantization bits. Default is 4. |
groupsize |
int
|
Group size. Computes independent scale and zero point for each group. -1 means no grouping (single scale and zero point for entire row). Default is -1. |
sym |
bool
|
Whether to use symmetric quantization. If True, zero point is placed at center. Default is False. |
mse |
bool
|
Enable MSE grid search for optimal clipping. Default is False. |
norm |
float
|
Lp norm exponent for MSE search. Default is 2.4. |
grid |
int
|
Number of candidate shrink levels for MSE search. Default is 100. |
Methods:
| Name | Description |
|---|---|
quantize_layer |
Quantize a layer using RTN. |
validate_params ¶
Validate RTN parameters once in setup().
Validated ranges
wbits: int, 1 <= wbits <= 64 groupsize: int, -1 or >= 1 sym: bool (no constraint) grid: int >= 1 (when mse=True) norm: float > 0 (when mse=True)
quantize_layer ¶
Quantize a layer using RTN.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module
|
Module
|
The layer module to quantize. |
required |
input
|
tuple or Tensor
|
Input tensor (not used in RTN). Default is None. |
None
|
hessian
|
Tensor
|
Hessian matrix (not used in RTN). Default is None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
RTNResult |
RTN quantization result object containing quantized weights and parameters. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If groupsize does not divide in_features. |
RTNResult¶
RTNResult
dataclass
¶
RTNResult(dequantized_weight: Tensor = None, quantization_time: float = None, output_squared_error: float = None, mean_output_squared_error: float = None, weight_squared_error: float = None, mean_weight_squared_error: float = None, relative_output_squared_error: float = None, relative_weight_squared_error: float = None, wbits: int = None, groupsize: int = None, sym: bool = None, quantized_weight: Optional[Tensor] = None, scale: Optional[Tensor] = None, zero: Optional[Tensor] = None)
Bases: QuantizationResult
Result class for RTN quantization.
Inherits from QuantizationResult and adds RTN-specific parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
dequantized_weight |
Tensor
|
Dequantized weights (FP16, CPU) - inherited from parent class. |
wbits |
int
|
Number of quantization bits used. |
groupsize |
int
|
Group size used (-1 means no grouping). |
sym |
bool
|
Whether symmetric quantization was used. |
quantized_weight |
Tensor
|
Quantized weights (INT type, CPU). |
scale |
Tensor
|
Scale coefficients (FP16, CPU). |
zero |
Tensor
|
Zero point (FP16, CPU). |