QEPConfig¶
Configuration dataclass for Quantization Error Propagation (QEP).
QEPConfig
dataclass
¶
QEPConfig(general: bool = False, percdamp: float = 0.01, perccorr: float = 0.5, device: str = 'cuda:0', exclude_layer_keywords: list[str] = (lambda: ['mlp.down_proj'])())
Configuration for Quantization Error Propagation (QEP).
Attributes:
| Name | Type | Description |
|---|---|---|
general |
bool
|
If True, use the generic (architecture-independent) implementation. If False, use the architecture-aware implementation that exploits shared activations (e.g., QKV layers in Llama sharing the same input activations). Default is False. |
percdamp |
float
|
Damping percentage for Hessian regularization. Default is 0.01. |
perccorr |
float
|
Correction percentage for error propagation. Default is 0.5. |
device |
str
|
Device to use for QEP computations (e.g., "cuda"). Default is "cuda:0". |
exclude_layer_keywords |
list[str]
|
List of keywords to identify
layers excluded from error propagation. Layers whose names
contain any of these keywords will be excluded.
Default is |
Examples:
Note
The default exclude_layer_keywords is designed for Llama-like
architectures and may need to be adjusted for other model families.