JointQ¶
JointQ Quantizer¶
JointQ
dataclass
¶
JointQ(name: str = None, num_layers: int = None, calc_quant_error: bool = False, include_layer_names: list[str] = None, exclude_layer_names: list[str] = (lambda: ['lm_head'])(), include_layer_keywords: list[str] = None, exclude_layer_keywords: list[str] = None, target_layer_types: tuple = (lambda: (Linear,))(), hessian_dtype: dtype = torch.float64, module_to_name: dict = dict(), results: dict = dict(), flag_calibration: bool = True, flag_hessian: bool = False, flag_xtx: bool = True, bits: int = 4, symmetric: bool = False, group_size: int = 128, batch_size: Optional[int] = None, log_level: int = 0, device: Optional[device] = None, regularization_lambda: Optional[float] = 0.2, actorder: bool = False, ils_enabled: bool = False, ils_num_iterations: int = 10, ils_num_clones: int = 8, ils_num_channels: Optional[int] = None)
Bases: Quantizer
JointQ quantizer class
JointQ is a quantization method that uses the jointq package.
Attributes:
| Name | Type | Description |
|---|---|---|
bits |
int
|
Number of bits for quantization. Default is 4. |
symmetric |
bool
|
Whether to use symmetric quantization. Default is False. |
group_size |
int or None
|
Group size for quantization. Default is 128. If None, per-channel quantization is used. |
batch_size |
int
|
Batch size for quantization. Default is None (solve all at once). |
log_level |
int
|
Log level (0: none, 1: minimal, 2: detailed). Default is 0. |
device |
device
|
Device for quantization. |
regularization_lambda |
float
|
Tikhonov regularization strength. Default is 0.2. Replaces X^T X with X^T X + nλI, where n = dim_n. λ is relative to the normalized Hessian (1/n)X^T X, so its meaning is consistent across different calibration sample sizes. Recommended range: 0.1 to 1.0. |
actorder |
bool
|
Whether to reorder columns by activation magnitude (Hessian diagonal) before quantization. Default is False. When enabled, columns with larger activations are grouped together, improving group quantization efficiency and GPTQ initial solution quality. |
ils_enabled |
bool
|
Whether to enable Iterated Local Search. Default is False. |
ils_num_iterations |
int
|
Number of ILS iterations. Default is 10. |
ils_num_clones |
int
|
Number of ILS clones. Default is 8. |
ils_num_channels |
int
|
Number of ILS channels. Default is None. |
Example
Basic usage::
from onecomp.quantizer.jointq import JointQ
quantizer = JointQ(
bits=4,
symmetric=False,
group_size=128,
device=torch.device(0),
)
With batch_size::
from onecomp.quantizer.jointq import JointQ
quantizer = JointQ(
bits=4,
symmetric=False,
group_size=128,
batch_size=4096,
device=torch.device(0),
)
Without Iterated Local Search (ILS)::
from onecomp.quantizer.jointq import JointQ
quantizer = JointQ(
bits=4,
symmetric=False,
group_size=128,
device=torch.device(0),
ils_enabled=False,
)
validate_params ¶
Validate JointQ parameters once in setup().
Validated ranges
bits: int >= 1 group_size: int >= 1 batch_size: int >= 1 or None log_level: int in {0, 1, 2} ils_num_iterations: int >= 1 (when ils_enabled=True) ils_num_clones: int >= 1 (when ils_enabled=True) ils_num_channels: int >= 1 or None (when ils_enabled=True)
quantize_layer ¶
Quantize the layer
If matrix_XX and dim_n are provided, uses the precomputed X^T X. Otherwise, computes matrix_X from input (legacy behavior).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module
|
Module
|
The layer module |
required |
input
|
tuple or Tensor
|
The input to the layer (input activations) |
None
|
hessian
|
Tensor
|
The Hessian matrix (not used in JointQ) |
None
|
matrix_XX
|
Tensor
|
Precomputed X^T X (FP64). If provided, this is used instead of input. |
None
|
dim_n
|
int
|
Number of samples. Required when matrix_XX is provided. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
JointQResult |
JointQ quantization result object |