SingleMoM
Exploratory Parameter-Efficient Adaptation
SingleMoM is an exploratory parameter-efficient adaptation approach that competes with LoRA on GLUE benchmarks at a fraction of the trainable parameter cost, while enabling zero-overhead expert switching at inference. Early experiments are encouraging — there's room to better understand its expressiveness, behavior across domains, and potential extensions (e.g. image adapters). SFT experiments on RoBERTa reproduce the LoRA paper's evaluation setup. RLHF track on LLaMA-3 explored per-expert datasets across language, conversational style, formatting, text-to-SQL, and structured output, with experts being combinable at inference (e.g. German × humanlike experts produced the fblgit/german-humanlike-clean-1k dataset).
- Client
- Independent — Author-attested
- Role
- Sole author
- Duration
- 2024
- Team
- Solo
| Method | Trainable Params | CoLA (MCC) | SST-2 | QQP | QNLI | MRPC |
|---|---|---|---|---|---|---|
| Full Fine-Tuning | 355M | 68.0 | 96.4 | 92.2 | 94.7 | 90.9 |
| LoRA | 0.8M | 68.2 ± 1.9 | 96.2 ± 0.5 | 91.6 ± 0.1 | 94.9 ± 0.3 | 90.9 ± 1.2 |
| SingleMoM | <0.25M | 67.5–68.8 | 96.0–96.3 | 90.7–91.0 | 94.5–94.7 | 89.5–90.4 |
- •Competes with LoRA on GLUE at a fraction of trainable params
- •Zero-overhead expert switching at inference
- •Experts can be combined (e.g. German × humanlike → real dataset output)
- •Tested under SFT, DPO, and PPO setups
- •Open directions: expressiveness, cross-domain behavior, image adapters