Skip to content

JointQ

JointQ Quantizer

JointQ dataclass

JointQ(name: str = None, num_layers: int = None, calc_quant_error: bool = False, include_layer_names: list[str] = None, exclude_layer_names: list[str] = (lambda: ['lm_head'])(), include_layer_keywords: list[str] = None, exclude_layer_keywords: list[str] = None, target_layer_types: tuple = (lambda: (Linear,))(), hessian_dtype: dtype = torch.float64, module_to_name: dict = dict(), results: dict = dict(), flag_calibration: bool = True, flag_hessian: bool = False, flag_xtx: bool = True, bits: int = 4, symmetric: bool = False, group_size: int = 128, batch_size: Optional[int] = None, log_level: int = 0, device: Optional[device] = None, regularization_lambda: Optional[float] = 0.2, actorder: bool = False, ils_enabled: bool = False, ils_num_iterations: int = 10, ils_num_clones: int = 8, ils_num_channels: Optional[int] = None)

Bases: Quantizer

JointQ quantizer class

JointQ is a quantization method that uses the jointq package.

Attributes:

Name Type Description
bits int

Number of bits for quantization. Default is 4.

symmetric bool

Whether to use symmetric quantization. Default is False.

group_size int or None

Group size for quantization. Default is 128. If None, per-channel quantization is used.

batch_size int

Batch size for quantization. Default is None (solve all at once).

log_level int

Log level (0: none, 1: minimal, 2: detailed). Default is 0.

device device

Device for quantization.

regularization_lambda float

Tikhonov regularization strength. Default is 0.2. Replaces X^T X with X^T X + nλI, where n = dim_n. λ is relative to the normalized Hessian (1/n)X^T X, so its meaning is consistent across different calibration sample sizes. Recommended range: 0.1 to 1.0.

actorder bool

Whether to reorder columns by activation magnitude (Hessian diagonal) before quantization. Default is False. When enabled, columns with larger activations are grouped together, improving group quantization efficiency and GPTQ initial solution quality.

ils_enabled bool

Whether to enable Iterated Local Search. Default is False.

ils_num_iterations int

Number of ILS iterations. Default is 10.

ils_num_clones int

Number of ILS clones. Default is 8.

ils_num_channels int

Number of ILS channels. Default is None.

Example

Basic usage::

from onecomp.quantizer.jointq import JointQ

quantizer = JointQ(
    bits=4,
    symmetric=False,
    group_size=128,
    device=torch.device(0),
)

With batch_size::

from onecomp.quantizer.jointq import JointQ

quantizer = JointQ(
    bits=4,
    symmetric=False,
    group_size=128,
    batch_size=4096,
    device=torch.device(0),
)

Without Iterated Local Search (ILS)::

from onecomp.quantizer.jointq import JointQ

quantizer = JointQ(
    bits=4,
    symmetric=False,
    group_size=128,
    device=torch.device(0),
    ils_enabled=False,
)

validate_params

validate_params()

Validate JointQ parameters once in setup().

Validated ranges

bits: int >= 1 group_size: int >= 1 batch_size: int >= 1 or None log_level: int in {0, 1, 2} ils_num_iterations: int >= 1 (when ils_enabled=True) ils_num_clones: int >= 1 (when ils_enabled=True) ils_num_channels: int >= 1 or None (when ils_enabled=True)

quantize_layer

quantize_layer(module, input=None, hessian=None, matrix_XX=None, dim_n=None)

Quantize the layer

If matrix_XX and dim_n are provided, uses the precomputed X^T X. Otherwise, computes matrix_X from input (legacy behavior).

Parameters:

Name Type Description Default
module Module

The layer module

required
input tuple or Tensor

The input to the layer (input activations)

None
hessian Tensor

The Hessian matrix (not used in JointQ)

None
matrix_XX Tensor

Precomputed X^T X (FP64). If provided, this is used instead of input.

None
dim_n int

Number of samples. Required when matrix_XX is provided.

None

Returns:

Name Type Description
JointQResult

JointQ quantization result object