RTN¶
RTN Quantizer¶
RTN
dataclass
¶
RTN(name: str = None, num_layers: int = None, calc_quant_error: bool = False, include_layer_names: list[str] = None, exclude_layer_names: list[str] = (lambda: ['lm_head'])(), include_layer_keywords: list[str] = None, exclude_layer_keywords: list[str] = None, target_layer_types: tuple = (lambda: (Linear,))(), hessian_dtype: dtype = torch.float32, module_to_name: dict = dict(), results: dict = dict(), flag_calibration: bool = False, flag_hessian: bool = False, flag_xtx: bool = False, wbits: int = 4, groupsize: int = -1, sym: bool = False)
Bases: Quantizer
RTN (Round-To-Nearest) quantizer.
RTN is the simplest quantization method that rounds weights to the nearest quantization level. It does not require calibration data or Hessian matrices, performing quantization using only weight statistics.
Quantization method: - Computes minimum and maximum values of weights - Computes scale and zero point - Rounds weights to nearest quantization level (Round-To-Nearest)
RTN does not require calibration data or Hessian matrix. Fastest method but may have lower accuracy compared to other methods.
Attributes:
| Name | Type | Description |
|---|---|---|
flag_calibration |
bool
|
Whether to use calibration data (False for RTN). |
flag_hessian |
bool
|
Whether to use Hessian matrix (False for RTN). |
wbits |
int
|
Number of quantization bits. Default is 4. |
groupsize |
int
|
Group size. Computes independent scale and zero point for each group. -1 means no grouping (single scale and zero point for entire row). Default is -1. |
sym |
bool
|
Whether to use symmetric quantization. If True, zero point is placed at center. Default is False. |
Methods:
| Name | Description |
|---|---|
quantize_layer |
Quantize a layer using RTN. |
validate_params ¶
Validate RTN parameters once in setup().
Validated ranges
wbits: int, 1 <= wbits <= 64 groupsize: int, -1 or >= 1 sym: bool (no constraint)
quantize_layer ¶
Quantize a layer using RTN.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module
|
Module
|
The layer module to quantize. |
required |
input
|
tuple or Tensor
|
Input tensor (not used in RTN). Default is None. |
None
|
hessian
|
Tensor
|
Hessian matrix (not used in RTN). Default is None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
RTNResult |
RTN quantization result object containing quantized weights and parameters. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If groupsize does not divide in_features. |
RTNResult¶
RTNResult
dataclass
¶
RTNResult(dequantized_weight: Tensor = None, quantization_time: float = None, output_squared_error: float = None, mean_output_squared_error: float = None, weight_squared_error: float = None, mean_weight_squared_error: float = None, relative_output_squared_error: float = None, relative_weight_squared_error: float = None, wbits: int = None, groupsize: int = None, sym: bool = None, quantized_weight: Optional[Tensor] = None, scale: Optional[Tensor] = None, zero: Optional[Tensor] = None)
Bases: QuantizationResult
Result class for RTN quantization.
Inherits from QuantizationResult and adds RTN-specific parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
dequantized_weight |
Tensor
|
Dequantized weights (FP16, CPU) - inherited from parent class. |
wbits |
int
|
Number of quantization bits used. |
groupsize |
int
|
Group size used (-1 means no grouping). |
sym |
bool
|
Whether symmetric quantization was used. |
quantized_weight |
Tensor
|
Quantized weights (INT type, CPU). |
scale |
Tensor
|
Scale coefficients (FP16, CPU). |
zero |
Tensor
|
Zero point (FP16, CPU). |