Pre-Process (Rotation Preprocessing)¶
Rotation preprocessing reduces quantization error by learning optimal rotation matrices (SpinQuant/OstQuant) and absorbing them into model weights before quantization.
prepare_rotated_model¶
prepare_rotated_model ¶
prepare_rotated_model(model_config: ModelConfig, save_directory: str, *, rotation: bool = True, scaling: bool = False, rotation_mode: str = 'random', scaling_mode: str = 'identity', seed: int = 0, enable_training: bool = True, calibration_dataset=None, max_length: int = 2048, num_calibration_samples: int = 128, calibration_strategy: str = 'drop_rand', wbits: int = 4, sym: bool = False, groupsize: int = -1, fp32_had: bool = False, use_sdpa: bool = False, training_args_override: dict | None = None) -> RotatedModelConfig
Train rotation matrices, apply them to model weights, and save.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_config
|
ModelConfig
|
Original model configuration ( |
required |
save_directory
|
str
|
Directory to save the rotated model. |
required |
rotation
|
bool
|
Whether to apply rotation matrices (R1, R2). |
True
|
scaling
|
bool
|
Whether to apply scaling diagonals (S_*). |
False
|
rotation_mode
|
str
|
|
'random'
|
scaling_mode
|
str
|
|
'identity'
|
seed
|
int
|
Random seed for rotation matrix initialisation and
calibration data preparation. Note that the Trainer uses a
separate seed ( |
0
|
enable_training
|
bool
|
If |
True
|
calibration_dataset
|
List of texts for calibration.
If |
None
|
|
max_length
|
int
|
Sequence length for calibration data (default: 2048). |
2048
|
num_calibration_samples
|
int
|
Number of calibration samples.
Default matches |
128
|
calibration_strategy
|
str
|
Strategy for preparing calibration inputs
( |
'drop_rand'
|
wbits
|
int
|
Weight quantisation bit-width for the RTN proxy during
training. Should match the quantizer's |
4
|
sym
|
bool
|
Symmetric quantisation for the RTN proxy. |
False
|
groupsize
|
int
|
Group size for the RTN proxy ( |
-1
|
fp32_had
|
bool
|
Use FP32 for the online Hadamard transform. |
False
|
use_sdpa
|
bool
|
Use SDPA attention implementation during training. |
False
|
training_args_override
|
dict | None
|
Override |
None
|
Returns:
| Type | Description |
|---|---|
RotatedModelConfig
|
class: |
RotatedModelConfig
|
save_directory. |
Examples:
Basic usage:
>>> from onecomp import ModelConfig, prepare_rotated_model, GPTQ
>>> model_config = ModelConfig(model_id="meta-llama/Llama-2-7b-hf")
>>> rotated_config = prepare_rotated_model(
... model_config=model_config,
... save_directory="./rotated_model",
... )
Without training (random rotation only):
RotatedModelConfig¶
ModelConfig subclass for loading rotation-preprocessed models.
Automatically registers Hadamard forward_pre_hook on down_proj layers.
RotatedModelConfig ¶
RotatedModelConfig(path: str = None, dtype: str = 'float16', device: str = 'auto', fp32_had: bool = None, **kwargs)
Bases: ModelConfig
ModelConfig subclass for rotation-preprocessed models.
Inherits ModelConfig and automatically registers deterministic
Hadamard forward_pre_hook on down_proj layers when
load_model() is called.
The saved model directory should contain:
config.json— HuggingFace model config (includesfp32_hadfield)model.safetensors— rotation-applied weightstokenizer.json
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the saved rotated model (required). |
None
|
dtype
|
str
|
Data type. Defaults to "float16". |
'float16'
|
device
|
str
|
Device. Defaults to "auto". |
'auto'
|
fp32_had
|
bool or None
|
Use FP32 for online Hadamard transform.
If None (default), auto-detect from |
None
|
Example
from onecomp import Runner, RotatedModelConfig, GPTQ
model_config = RotatedModelConfig(path="./rotated_model") quantizer = GPTQ(wbits=4, groupsize=128) runner = Runner(model_config=model_config, quantizer=quantizer) runner.run()
Workflow¶
┌─────────────────────────────────────────────────────────────┐
│ Step 1: Rotation Preprocessing │
│ │
│ ModelConfig ──► prepare_rotated_model() ──► RotatedModelConfig
│ (train rotation matrices, │
│ absorb into weights, │
│ save rotated model) │
└──────────────────────────┬──────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────┐
│ Step 2: Quantization │
│ │
│ RotatedModelConfig ──► Runner(quantizer=GPTQ/RTN/...) ──► run()
│ (auto-registers ──► save_quantized_model() │
│ Hadamard hooks) │
└──────────────────────────┬──────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────┐
│ Step 3: Load │
│ │
│ load_quantized_model() │
│ (auto-detects "rotated: true" in config.json, │
│ registers Hadamard hooks automatically) │
└─────────────────────────────────────────────────────────────┘
Note
The wbits, groupsize, and sym parameters passed to prepare_rotated_model()
control the RTN proxy used during rotation training. These values must match
the quantizer parameters used in Step 2.