Algorithm Overview¶

Fujitsu One Compression (OneComp) provides a collection of post-training quantization (PTQ) algorithms for LLMs. Each algorithm represents a different approach to compressing model weights while preserving model quality.

What is Post-Training Quantization?¶

Post-training quantization converts model weights from high-precision floating-point (e.g., FP16) to lower-precision representations (e.g., INT4, INT3) after training is complete. This reduces model size and can accelerate inference without requiring retraining.

The core problem is to find quantized weights \(\hat{W}\) that minimize the error:

\[ \min_{\hat{W}} \| W X - \hat{W} X \|_F^2 \]

where \(W\) is the original weight matrix and \(X\) is the input activation matrix.

Available Algorithms¶

Algorithm	Bit-width	Calibration	Description
GPTQ	Arbitrary (typically 2--4)	Required	Hessian-based optimal rounding with column-by-column processing
DBF	~1.5 (binary)	Required	Double Binary Factorization: \(W \approx A \cdot \text{diag}(d) \cdot B\)
RTN	Arbitrary	Not required	Round-To-Nearest baseline
JointQ	Arbitrary	Required	Joint optimization across groups
QuIP	Arbitrary	Required	Quantization with Incoherence Processing
ARB	Arbitrary	Required	Adaptive Rounding with Binary search
CQ	Arbitrary	Required	Combinatorial quantization
QBB	Arbitrary	Required	Quantization with Block-wise Balancing
Onebit	1-bit	Required	Extreme 1-bit quantization

Quantization Error Propagation (QEP)¶

QEP is not a standalone quantizer but a meta-algorithm that works on top of any layer-wise quantizer. It compensates for the error that propagates from one layer to the next during sequential quantization.

QEP can be combined with any quantizer:

runner = Runner(
    model_config=model_config,
    quantizer=GPTQ(wbits=3),
    qep=True,
)

Choosing an Algorithm¶

GPTQ is the recommended default for most use cases (4-bit or 3-bit quantization)
GPTQ + QEP provides the best quality at low bit-widths (3-bit or lower)
RTN is useful as a fast baseline or when calibration data is not available
DBF targets extreme compression (~1.5-bit) with binary factorization