RTN (Round-To-Nearest)¶

RTN is the simplest quantization method. It rounds each weight to the nearest quantization level without using calibration data or Hessian information.

Algorithm¶

For each weight element \(w\):

\[ \hat{w} = \text{clamp}\left(\left\lfloor \frac{w}{s} \right\rceil + z,\ 0,\ 2^b - 1\right) \cdot s - z \cdot s \]

where:

\(s\) is the scale factor
\(z\) is the zero point
\(b\) is the bit-width
\(\lfloor \cdot \rceil\) denotes rounding to the nearest integer

The integer level range is always \([0, 2^b - 1]\) regardless of sym.

Symmetric (sym=True): max-abs symmetrisation \(x_{\max} = \max(|x_{\min}|, x_{\max})\), with zero point at \((2^b - 1 + 1) / 2\). This aligns with GPTQExcecutor.
Asymmetric (sym=False): range includes zero (\(x_{\min} \le 0 \le x_{\max}\)), zero point = \(\lfloor -x_{\min} / s \rceil\).

When mse=True, an MSE grid search is performed to find the optimal clipping range that minimises the Lp-norm reconstruction error.

RTN serves as a baseline for comparing more sophisticated quantization algorithms.

Parameters¶

Parameter	Type	Description	Default
`wbits`	`int`	Quantization bit-width	`4`
`groupsize`	`int`	Group size for group-wise quantization (-1 = none)	`-1`
`sym`	`bool`	Symmetric quantization	`False`
`mse`	`bool`	Enable MSE grid search for optimal clipping	`False`
`norm`	`float`	Lp norm exponent for MSE search	`2.4`
`grid`	`int`	Number of candidate shrink levels for MSE search	`100`

Usage¶

from onecomp import ModelConfig, Runner
from onecomp.quantizer.rtn import RTN

model_config = ModelConfig(
    model_id="meta-llama/Llama-2-7b-hf",
    device="cuda:0",
)

rtn = RTN(wbits=4, groupsize=128)

runner = Runner(model_config=model_config, quantizer=rtn)
runner.run()

Characteristics¶

No calibration data required -- quantization is performed directly on the model weights
Very fast -- no optimization or iterative processing
Lower quality -- compared to GPTQ or other Hessian-based methods, RTN produces higher quantization error
Useful as a baseline -- provides a lower bound on expected quantization quality

When to Use RTN¶

Quick experiments where calibration data is not available
Comparing against more advanced methods as a baseline
High bit-width quantization (e.g., 8-bit) where the difference from optimal is small