AutoBit¶
AutoBitQuantizer¶
AutoBitQuantizer
dataclass
¶
AutoBitQuantizer(name: str = None, num_layers: int = None, calc_quant_error: bool = False, include_layer_names: list[str] = None, exclude_layer_names: list[str] = (lambda: ['lm_head'])(), include_layer_keywords: list[str] = None, exclude_layer_keywords: list[str] = (lambda: ['per_layer_model_projection'])(), target_layer_types: tuple = (lambda: (Linear,))(), hessian_dtype: dtype = torch.float32, module_to_name: dict = dict(), results: dict = dict(), flag_calibration: bool = False, flag_hessian: bool = False, flag_xtx: bool = False, flag_qep_supported: bool = True, quantizers: list = list(), assignment_strategy: AssignmentStrategy = AssignmentStrategy.ACTIVATION_AWARE, ratios: list = None, target_bit: float = None, target_bit_is_effective: bool = False, calibration_config: CalibrationConfig = None, use_curvature_b: bool = True, save_path: str = None, auto_dbf: bool = True, dbf_threshold: float = 2.0, dbf_iters: int = None, fused_groups: list = (lambda: [['self_attn.q_proj', 'self_attn.k_proj', 'self_attn.v_proj'], ['mlp.gate_proj', 'mlp.up_proj']])(), enable_fused_groups: bool = True)
Bases: Quantizer
Mixed-precision quantizer that assigns each layer to a child quantizer.
Given a target_bit budget and a list of candidate quantizers,
this class solves the layer-to-quantizer assignment via ILP
(optionally activation-aware) or manual rules, with optional DBF
fallback for ultra-low-bit targets.
quantizers must be provided. Each candidate's groupsize is
respected by both the RTN error evaluation and the effective-bpw
budget, so mixing group sizes across candidates is fully supported.
To estimate target_bit from available VRAM before creating
this object, use :func:onecomp.utils.estimate_wbits_from_vram::
from onecomp.utils import estimate_wbits_from_vram
result = estimate_wbits_from_vram("meta-llama/Llama-2-7b-hf",
total_vram_gb=24)
autobit = AutoBitQuantizer(target_bit=result.target_bitwidth, ...)
Following assignment strategies are supported:
- ILP (
"ilp") - Activation-aware ILP (
"activation_aware") - Manual assignment (
"manual")
DBF fallback for ultra-low-bit targets is supported when
auto_dbf=True and target_bit falls below dbf_threshold.
When you specify save_path, the assignment will be visualized
as a heatmap.
Examples:
Activation-aware with explicit target::
from onecomp.calibration import CalibrationConfig
autobit = AutoBitQuantizer(
assignment_strategy="activation_aware",
target_bit=3.0,
quantizers=[GPTQ(wbits=2), GPTQ(wbits=4)],
calibration_config=CalibrationConfig(
num_calibration_samples=64,
max_length=256,
),
)
Mixed bit-width and group size::
autobit = AutoBitQuantizer(
target_bit=3.0,
quantizers=[
GPTQ(wbits=2, groupsize=32),
GPTQ(wbits=4, groupsize=128),
GPTQ(wbits=4, groupsize=32),
],
)
Mixed bit-width and group size::
autobit = AutoBitQuantizer(
target_bit=3.0,
quantizers=[
GPTQ(wbits=2, groupsize=32),
GPTQ(wbits=4, groupsize=128),
GPTQ(wbits=4, groupsize=32),
],
)
Ultra-low-bit with DBF fallback (target_bit <= dbf_threshold)::
autobit = AutoBitQuantizer(
target_bit=1.5,
dbf_iters=10, # fast testing
)
Manual assignment::
autobit = AutoBitQuantizer(
assignment_strategy="manual",
quantizers=[
GPTQ(wbits=2, include_layer_keywords=["mlp"]),
GPTQ(wbits=4, include_layer_keywords=["self_attn"]),
],
)