CLI Reference¶
OneComp provides the onecomp command for quantizing models directly from the terminal.
Installation¶
The onecomp command is installed automatically with the package:
Verify the installation:
You can also use python -m onecomp as an alternative:
Usage¶
onecomp [-h] [--wbits WBITS] [--total-vram-gb GB] [--groupsize GROUPSIZE]
[--device DEVICE] [--no-qep] [--no-eval] [--eval-original]
[--save-dir SAVE_DIR] [--version]
model_id
Positional Arguments¶
| Argument | Description |
|---|---|
model_id |
Hugging Face model ID or local path |
Options¶
| Option | Default | Description |
|---|---|---|
--wbits WBITS |
None (auto) |
Target bitwidth. When omitted, estimated from VRAM |
--total-vram-gb GB |
None (auto) |
VRAM budget in GB for bitwidth estimation. When omitted, detected from GPU |
--groupsize GROUPSIZE |
128 |
GPTQ group size (-1 to disable grouping) |
--device DEVICE |
cuda:0 |
Device to place the model on |
--no-qep |
Disable QEP (enabled by default) | |
--no-eval |
Skip perplexity and accuracy evaluation | |
--eval-original |
Also evaluate the original (unquantized) model | |
--save-dir SAVE_DIR |
auto |
Save directory (auto = derived from model name, none to skip) |
--version |
Show version and exit |
Examples¶
Basic usage (AutoBit with VRAM auto-estimation)¶
Quantize with defaults (AutoBit mixed-precision + QEP, evaluate, auto-save):
Specify VRAM budget¶
Fixed bitwidth (skip VRAM estimation)¶
3-bit quantization¶
Custom group size¶
Without QEP¶
Skip evaluation (quantize and save only)¶
Custom save directory¶
Skip saving¶
Evaluate original model too¶
Use a specific GPU¶
Default Behavior¶
When run with no options, the onecomp command:
- Loads the model and tokenizer from Hugging Face Hub
- Estimates the target bitwidth from available VRAM
- Quantizes with AutoBit (ILP-based mixed-precision) + QEP
- Evaluates perplexity (wikitext-2) and zero-shot accuracy
- Saves the quantized model to
<model_name>-autobit-<X>bit/
Equivalent Python API¶
The CLI is a thin wrapper around Runner.auto_run. Every CLI invocation maps directly
to the Python API:
is equivalent to: