2024Research

SingleMoM

Exploratory Parameter-Efficient Adaptation

★Promising Early Results — More Research Underway

SingleMoM is an exploratory parameter-efficient adaptation approach that competes with LoRA on GLUE benchmarks at a fraction of the trainable parameter cost, while enabling zero-overhead expert switching at inference. Early experiments are encouraging — there's room to better understand its expressiveness, behavior across domains, and potential extensions (e.g. image adapters). SFT experiments on RoBERTa reproduce the LoRA paper's evaluation setup. RLHF track on LLaMA-3 explored per-expert datasets across language, conversational style, formatting, text-to-SQL, and structured output, with experts being combinable at inference (e.g. German × humanlike experts produced the fblgit/german-humanlike-clean-1k dataset).

Client: Independent — Author-attested
Role: Sole author
Duration: 2024
Team: Solo

GLUE Benchmark — RoBERTa-large

Method	Trainable Params	CoLA (MCC)	SST-2	QQP	QNLI	MRPC
Full Fine-Tuning	355M	68.0	96.4	92.2	94.7	90.9
LoRA	0.8M	68.2 ± 1.9	96.2 ± 0.5	91.6 ± 0.1	94.9 ± 0.3	90.9 ± 1.2
SingleMoM	<0.25M	67.5–68.8	96.0–96.3	90.7–91.0	94.5–94.7	89.5–90.4

Baselines from LoRA paper (arxiv 2106.09685, Table 2)SingleMoM scores reported as ranges across runs.

Highlights

•Competes with LoRA on GLUE at a fraction of trainable params
•Zero-overhead expert switching at inference
•Experts can be combined (e.g. German × humanlike → real dataset output)
•Tested under SFT, DPO, and PPO setups
•Open directions: expressiveness, cross-domain behavior, image adapters

transformerstorchwandbsftrlhflora-alternativeparameter-efficient