LPCDConfig¶
Configuration dataclass for Layer-Projected Coordinate Descent (LPCD).
LPCDConfig
dataclass
¶
LPCDConfig(enable_qk: bool = False, enable_vo: bool = False, enable_ud: bool = False, enable_residual: bool = True, alt_steps: int = 1, perccorr: float = 0.5, percdamp: float = 0.01, use_closed_form: bool = True, gd_steps: int = 20, gd_batch_size: int = 16, gd_base_lr: float = 0.0001, device: str = 'cuda:0')
Configuration for LPCD optimisation.
Attributes:
| Name | Type | Description |
|---|---|---|
enable_qk |
bool
|
Optimise Query/Key projections jointly. |
enable_vo |
bool
|
Optimise Value/Output projections jointly. |
enable_ud |
bool
|
Optimise Up/Down projections jointly. |
enable_residual |
bool
|
Optimise residual connections (o_proj, down_proj). |
alt_steps |
int
|
Number of alternating coordinate-descent steps. |
perccorr |
float
|
Correction percentage for weight relaxation. |
percdamp |
float
|
Damping percentage for Hessian regularisation. |
use_closed_form |
bool
|
Use closed-form solvers when available. |
gd_steps |
int
|
Number of gradient-descent epochs per sub-problem. |
gd_batch_size |
int
|
Effective batch size for gradient accumulation. |
gd_base_lr |
float
|
Base learning rate for gradient-descent solver. |
device |
str
|
Device to perform LPCD optimisation on. |
Examples:
Minimal (residual correction only, fast)::
LPCDConfig()
All sub-modules enabled (best quality, slower)::
LPCDConfig(
enable_qk=True,
enable_vo=True,
enable_ud=True,
)