Portfolio
2024Research

SingleMoM

Exploratory Parameter-Efficient Adaptation

Promising Early Results — More Research Underway

SingleMoM is an exploratory parameter-efficient adaptation approach that competes with LoRA on GLUE benchmarks at a fraction of the trainable parameter cost, while enabling zero-overhead expert switching at inference. Early experiments are encouraging — there's room to better understand its expressiveness, behavior across domains, and potential extensions (e.g. image adapters). SFT experiments on RoBERTa reproduce the LoRA paper's evaluation setup. RLHF track on LLaMA-3 explored per-expert datasets across language, conversational style, formatting, text-to-SQL, and structured output, with experts being combinable at inference (e.g. German × humanlike experts produced the fblgit/german-humanlike-clean-1k dataset).

Client
Independent — Author-attested
Role
Sole author
Duration
2024
Team
Solo
GLUE Benchmark — RoBERTa-large
MethodTrainable ParamsCoLA (MCC)SST-2QQPQNLIMRPC
Full Fine-Tuning355M68.096.492.294.790.9
LoRA0.8M68.2 ± 1.996.2 ± 0.591.6 ± 0.194.9 ± 0.390.9 ± 1.2
SingleMoM<0.25M67.5–68.896.0–96.390.7–91.094.5–94.789.5–90.4
Baselines from LoRA paper (arxiv 2106.09685, Table 2)SingleMoM scores reported as ranges across runs.
Highlights
  • Competes with LoRA on GLUE at a fraction of trainable params
  • Zero-overhead expert switching at inference
  • Experts can be combined (e.g. German × humanlike → real dataset output)
  • Tested under SFT, DPO, and PPO setups
  • Open directions: expressiveness, cross-domain behavior, image adapters
transformerstorchwandbsftrlhflora-alternativeparameter-efficient