Skip to content

LPCDConfig

Configuration dataclass for Layer-Projected Coordinate Descent (LPCD).

LPCDConfig dataclass

LPCDConfig(enable_qk: bool = False, enable_vo: bool = False, enable_ud: bool = False, enable_residual: bool = True, alt_steps: int = 1, perccorr: float = 0.5, percdamp: float = 0.01, use_closed_form: bool = True, gd_steps: int = 20, gd_batch_size: int = 16, gd_base_lr: float = 0.0001, device: str = 'cuda:0')

Configuration for LPCD optimisation.

Attributes:

Name Type Description
enable_qk bool

Optimise Query/Key projections jointly.

enable_vo bool

Optimise Value/Output projections jointly.

enable_ud bool

Optimise Up/Down projections jointly.

enable_residual bool

Optimise residual connections (o_proj, down_proj).

alt_steps int

Number of alternating coordinate-descent steps.

perccorr float

Correction percentage for weight relaxation.

percdamp float

Damping percentage for Hessian regularisation.

use_closed_form bool

Use closed-form solvers when available.

gd_steps int

Number of gradient-descent epochs per sub-problem.

gd_batch_size int

Effective batch size for gradient accumulation.

gd_base_lr float

Base learning rate for gradient-descent solver.

device str

Device to perform LPCD optimisation on.

Examples:

Minimal (residual correction only, fast)::

LPCDConfig()

All sub-modules enabled (best quality, slower)::

LPCDConfig(
    enable_qk=True,
    enable_vo=True,
    enable_ud=True,
)