AutoBit¶

AutoBitQuantizer¶

AutoBitQuantizer `dataclass` ¶

AutoBitQuantizer(name: str = None, num_layers: int = None, calc_quant_error: bool = False, include_layer_names: list[str] = None, exclude_layer_names: list[str] = (lambda: ['lm_head'])(), include_layer_keywords: list[str] = None, exclude_layer_keywords: list[str] = (lambda: ['per_layer_model_projection'])(), target_layer_types: tuple = (lambda: (Linear,))(), hessian_dtype: dtype = torch.float32, module_to_name: dict = dict(), results: dict = dict(), flag_calibration: bool = False, flag_hessian: bool = False, flag_xtx: bool = False, flag_qep_supported: bool = True, quantizers: list = list(), assignment_strategy: AssignmentStrategy = AssignmentStrategy.ACTIVATION_AWARE, ratios: list = None, target_bit: float = None, target_bit_is_effective: bool = False, calibration_config: CalibrationConfig = None, use_curvature_b: bool = True, save_path: str = None, auto_dbf: bool = True, dbf_threshold: float = 2.0, dbf_iters: int = None, fused_groups: list = (lambda: [['self_attn.q_proj', 'self_attn.k_proj', 'self_attn.v_proj'], ['mlp.gate_proj', 'mlp.up_proj']])(), enable_fused_groups: bool = True)

Bases: Quantizer

Mixed-precision quantizer that assigns each layer to a child quantizer.

Given a target_bit budget and a list of candidate quantizers, this class solves the layer-to-quantizer assignment via ILP (optionally activation-aware) or manual rules, with optional DBF fallback for ultra-low-bit targets.

quantizers must be provided. Each candidate's groupsize is respected by both the RTN error evaluation and the effective-bpw budget, so mixing group sizes across candidates is fully supported.

To estimate target_bit from available VRAM before creating this object, use :func:onecomp.utils.estimate_wbits_from_vram::

from onecomp.utils import estimate_wbits_from_vram
result = estimate_wbits_from_vram("meta-llama/Llama-2-7b-hf",
                                  total_vram_gb=24)
autobit = AutoBitQuantizer(target_bit=result.target_bitwidth, ...)

Following assignment strategies are supported:

ILP ("ilp")
Activation-aware ILP ("activation_aware")
Manual assignment ("manual")

DBF fallback for ultra-low-bit targets is supported when auto_dbf=True and target_bit falls below dbf_threshold.

When you specify save_path, the assignment will be visualized as a heatmap.

Examples:

Activation-aware with explicit target::

    from onecomp.calibration import CalibrationConfig

    autobit = AutoBitQuantizer(
        assignment_strategy="activation_aware",
        target_bit=3.0,
        quantizers=[GPTQ(wbits=2), GPTQ(wbits=4)],
        calibration_config=CalibrationConfig(
            num_calibration_samples=64,
            max_length=256,
        ),
    )

Mixed bit-width and group size::

    autobit = AutoBitQuantizer(
        target_bit=3.0,
        quantizers=[
            GPTQ(wbits=2, groupsize=32),
            GPTQ(wbits=4, groupsize=128),
            GPTQ(wbits=4, groupsize=32),
        ],
    )

Mixed bit-width and group size::

    autobit = AutoBitQuantizer(
        target_bit=3.0,
        quantizers=[
            GPTQ(wbits=2, groupsize=32),
            GPTQ(wbits=4, groupsize=128),
            GPTQ(wbits=4, groupsize=32),
        ],
    )

Ultra-low-bit with DBF fallback (target_bit <= dbf_threshold)::

    autobit = AutoBitQuantizer(
        target_bit=1.5,
        dbf_iters=10,       # fast testing
    )

Manual assignment::

    autobit = AutoBitQuantizer(
        assignment_strategy="manual",
        quantizers=[
            GPTQ(wbits=2, include_layer_keywords=["mlp"]),
            GPTQ(wbits=4, include_layer_keywords=["self_attn"]),
        ],
    )

validate_params ¶

validate_params()

Validate AutoBitQuantizer parameters.

AssignmentStrategy¶

AssignmentStrategy ¶

Bases: StrEnum

Layer-to-quantizer assignment strategies.

fn `property` ¶

fn

Return the assignment function for this strategy.