CalibrationConfig¶
The CalibrationConfig dataclass groups all calibration-related parameters
used by Runner and AutoBitQuantizer.
CalibrationConfig
dataclass
¶
CalibrationConfig(calibration_dataset: str = 'c4', max_length: int = 2048, num_calibration_samples: int = 512, strategy: str = 'drop_rand', seed: int = 0, batch_size: Optional[int] = None, num_layers_per_group: int = 7, text_key: str = 'text', use_quality_filter: bool = False, max_documents: int = 10000)
Configuration for calibration data preparation.
Groups all calibration-related parameters into a single object
used by :class:~onecomp.runner.Runner and
:class:~onecomp.quantizer.autobit.AutoBitQuantizer.
Attributes:
| Name | Type | Description |
|---|---|---|
calibration_dataset |
str
|
Dataset name ( |
max_length |
int
|
Maximum token length per calibration chunk. |
num_calibration_samples |
int
|
Target number of calibration samples. |
strategy |
str
|
Chunking strategy. One of |
seed |
int
|
Random seed used by stochastic strategies
(e.g. |
batch_size |
int or None
|
Batch size for chunked calibration forward passes.
|
num_layers_per_group |
int
|
Number of layers processed simultaneously in chunked
calibration mode.
Used only by chunked quantization
(:func: |
text_key |
str
|
Column name used when loading custom or Hub datasets. |
use_quality_filter |
bool
|
Apply C4 quality filtering (ignored for non-C4 sources). |
max_documents |
int
|
Cap on documents loaded from custom files or Hub datasets. |
Examples:
Default configuration (C4 dataset)::
config = CalibrationConfig()
WikiText-2 with longer sequences::
config = CalibrationConfig(
calibration_dataset="wikitext2",
max_length=2048,
num_calibration_samples=256,
)
Custom file with quality filtering::
config = CalibrationConfig(
calibration_dataset="./my_data.jsonl",
text_key="content",
use_quality_filter=True,
)