Configuration¶

This page describes all configurable components in Fujitsu One Compression (OneComp).

ModelConfig¶

ModelConfig wraps model loading and tokenizer initialization.

from onecomp import ModelConfig

model_config = ModelConfig(
    model_id="meta-llama/Llama-2-7b-hf",
    dtype="float16",
    device="cuda:0",
)

Parameter	Type	Description	Default
`model_id`	`str`	Hugging Face Hub model ID	`None`
`path`	`str`	Local path to model directory	`None`
`dtype`	`str`	Model precision (`"float16"`, `"float32"`)	`"float16"`
`device`	`str`	Device placement (`"cpu"`, `"cuda"`, `"auto"`)	`"auto"`

Note

Provide exactly one of model_id or path. A ValueError is raised if neither is specified.

Runner¶

Runner is the main entry point for quantization. It manages the full pipeline: loading the model, preparing calibration data, executing quantization, and providing evaluation utilities.

from onecomp import Runner

runner = Runner(
    model_config=model_config,
    quantizer=quantizer,
    max_length=2048,
    num_calibration_samples=512,
    qep=False,
)

Core Parameters¶

Parameter	Type	Description	Default
`model_config`	`ModelConfig`	Model and tokenizer configuration	—
`quantizer`	`Quantizer`	Quantization method	`None`
`quantizers`	`list[Quantizer]`	Multiple quantizers (for benchmarking)	`None`
`qep`	`bool`	Enable QEP	`False`
`qep_config`	`QEPConfig`	QEP configuration	`None`

Calibration Parameters¶

Parameter	Type	Description	Default
`calibration_dataset`	`Dataset`	Custom calibration dataset	`None`
`max_length`	`int`	Maximum input sequence length	`2048`
`num_calibration_samples`	`int`	Number of calibration samples	`512`
`calibration_strategy`	`str`	Strategy for preparing calibration inputs	`"drop_rand"`
`calibration_seed`	`int`	Random seed for calibration	`0`
`calibration_batch_size`	`int`	Batch size for chunked calibration	`None`
`num_layers_per_group`	`int`	Layers processed simultaneously in chunked mode	`7`

Advanced Parameters¶

Parameter	Type	Description	Default
`multi_gpu`	`bool`	Enable multi-GPU layer-wise parallel quantization	`False`
`gpu_ids`	`list[int]`	Specific GPU IDs to use	`None`

Calibration Strategies¶

Strategy	Description
`"drop_rand"`	Tokenize each document independently; take a random window of `max_length` tokens.
`"drop_head"`	Same, but always take the first `max_length` tokens.
`"concat_chunk"`	Concatenate all texts, tokenize, and split into fixed-length chunks.
`"concat_chunk_align"`	Same as `concat_chunk`, but adjusts samples so chunk count equals `num_calibration_samples`.

Valid Parameter Combinations¶

`quantizers`	`qep`	`multi_gpu`	`calibration_batch_size`
Specified	False	False	Specified
None	True	False	None
None	False	True	None
None	False	False	Specified
None	False	False	None

QEPConfig¶

QEPConfig controls Quantization Error Propagation behavior.

from onecomp import QEPConfig

qep_config = QEPConfig(
    general=False,
    percdamp=0.01,
    perccorr=0.5,
    device="cuda:0",
    exclude_layer_keywords=["mlp.down_proj"],
)

Parameter	Type	Description	Default
`general`	`bool`	Use generic (architecture-independent) QEP	`False`
`percdamp`	`float`	Damping percentage for Hessian regularization	`0.01`
`perccorr`	`float`	Correction percentage for error propagation	`0.5`
`device`	`str`	GPU device for QEP computations	`"cuda:0"`
`exclude_layer_keywords`	`list[str]`	Layer keywords excluded from error propagation	`["mlp.down_proj"]`

Tip

The default general=False uses the architecture-aware implementation, which is faster because it exploits shared activations (e.g., QKV layers sharing the same input in Llama-like models).

Quantizer Common Parameters¶

All quantizers inherit from the Quantizer base class and share these parameters:

Parameter	Type	Description	Default
`name`	`str`	Quantizer name (defaults to class name)	`None`
`num_layers`	`int`	Maximum layers to quantize	`None`
`calc_quant_error`	`bool`	Calculate quantization error per layer	`False`
`include_layer_names`	`list[str]`	Layers to quantize (exact match)	`None`
`exclude_layer_names`	`list[str]`	Layers to skip (exact match)	`["lm_head"]`
`include_layer_keywords`	`list[str]`	Quantize layers containing any keyword	`None`
`exclude_layer_keywords`	`list[str]`	Skip layers containing any keyword	`None`

Layer Selection Priority¶

Filter by layer type (target_layer_types)
If include_layer_names is set, only include exact matches
If include_layer_keywords is set, only include layers containing any keyword
Exclude exclude_layer_names (exact match)
Exclude exclude_layer_keywords (keyword match)
Limit by num_layers