AutoBit¶
AutoBitQuantizer¶
AutoBitQuantizer
dataclass
¶
AutoBitQuantizer(name: str = None, num_layers: int = None, calc_quant_error: bool = False, include_layer_names: list[str] = None, exclude_layer_names: list[str] = (lambda: ['lm_head'])(), include_layer_keywords: list[str] = None, exclude_layer_keywords: list[str] = None, target_layer_types: tuple = (lambda: (Linear,))(), hessian_dtype: dtype = torch.float32, module_to_name: dict = dict(), results: dict = dict(), flag_calibration: bool = False, flag_hessian: bool = False, flag_xtx: bool = False, quantizers: list = list(), assignment_strategy: AssignmentStrategy = AssignmentStrategy.ACTIVATION_AWARE, ratios: list = None, target_bit: float = None, num_calib_samples: int = 128, calib_seqlen: int = 256, use_curvature_b: bool = True, save_path: str = None, auto_dbf: bool = True, dbf_threshold: float = 2.0, dbf_iters: int = None, fused_groups: list = (lambda: [['self_attn.q_proj', 'self_attn.k_proj', 'self_attn.v_proj'], ['mlp.gate_proj', 'mlp.up_proj']])(), enable_fused_groups: bool = False)
Bases: Quantizer
Mixed-precision quantizer that assigns each layer to a child quantizer.
Given a target_bit budget and a list of candidate quantizers,
this class solves the layer-to-quantizer assignment via ILP
(optionally activation-aware) or manual rules, with optional DBF
fallback for ultra-low-bit targets.
When quantizers is not provided, GPTQ candidates for 2, 3, 4, and 8 bit
are generated automatically.
To estimate target_bit from available VRAM before creating
this object, use :func:onecomp.utils.estimate_wbits_from_vram::
from onecomp.utils import estimate_wbits_from_vram
result = estimate_wbits_from_vram("meta-llama/Llama-2-7b-hf",
total_vram_gb=24)
autobit = AutoBitQuantizer(target_bit=result.target_bitwidth, ...)
Following assignment strategies are supported:
- ILP (
"ilp") - Activation-aware ILP (
"activation_aware") - Manual assignment (
"manual")
DBF fallback for ultra-low-bit targets is supported when
auto_dbf=True and target_bit falls below dbf_threshold.
When you specify save_path, the assignment will be visualized
as a heatmap.
Examples:
Activation-aware with explicit target::
autobit = AutoBitQuantizer(
assignment_strategy="activation_aware",
target_bit=3.0,
quantizers=[GPTQ(wbits=2), GPTQ(wbits=4)],
num_calib_samples=64,
)
Ultra-low-bit with DBF fallback (target_bit <= dbf_threshold)::
autobit = AutoBitQuantizer(
target_bit=1.5,
dbf_iters=10, # fast testing
)
Manual assignment::
autobit = AutoBitQuantizer(
assignment_strategy="manual",
quantizers=[
GPTQ(wbits=2, include_layer_keywords=["mlp"]),
GPTQ(wbits=4, include_layer_keywords=["self_attn"]),
],
)