LPCDConfig¶

Configuration dataclass for Layer-Projected Coordinate Descent (LPCD).

LPCDConfig `dataclass` ¶

LPCDConfig(enable_qk: bool = False, enable_vo: bool = False, enable_ud: bool = False, enable_residual: bool = True, alt_steps: int = 1, perccorr: float = 0.5, percdamp: float = 0.01, use_closed_form: bool = True, gd_steps: int = 20, gd_batch_size: int = 16, gd_base_lr: float = 0.0001, device: str = 'cuda:0')

Configuration for LPCD optimisation.

Attributes:

Name	Type	Description
`enable_qk`	`bool`	Optimise Query/Key projections jointly.
`enable_vo`	`bool`	Optimise Value/Output projections jointly.
`enable_ud`	`bool`	Optimise Up/Down projections jointly.
`enable_residual`	`bool`	Optimise residual connections (o_proj, down_proj).
`alt_steps`	`int`	Number of alternating coordinate-descent steps.
`perccorr`	`float`	Correction percentage for weight relaxation.
`percdamp`	`float`	Damping percentage for Hessian regularisation.
`use_closed_form`	`bool`	Use closed-form solvers when available.
`gd_steps`	`int`	Number of gradient-descent epochs per sub-problem.
`gd_batch_size`	`int`	Effective batch size for gradient accumulation.
`gd_base_lr`	`float`	Base learning rate for gradient-descent solver.
`device`	`str`	Device to perform LPCD optimisation on.

Examples:

Minimal (residual correction only, fast)::

LPCDConfig()

All sub-modules enabled (best quality, slower)::

LPCDConfig(
    enable_qk=True,
    enable_vo=True,
    enable_ud=True,
)

LPCDConfig¶

LPCDConfig dataclass ¶

LPCDConfig `dataclass` ¶