Spaces:
Sleeping
Sleeping
Built a stability adapter on top of TRL's SFTTrainer (CRMA + ZClip) — sharing ablation results
#1
by Fourwheels2512 - opened
Hi TRL community,
I've been using TRL's SFTTrainer as the backbone for a fine-tuning SaaS and wanted to share something I built on top of it: CRMA (Constrained Residual Mixing Adapter), a stability adapter that runs alongside LoRA/QLoRA.
What CRMA adds to SFTTrainer
CRMA hooks into the training loop via a custom optimizer group and per-step callback. It adds:
- A Sinkhorn-constrained doubly stochastic mixing matrix at each transformer block
- ZClip adaptive gradient clipping (replaces
max_grad_norm=1.0with a z-score based threshold, arXiv:2504.02507) - PiSSA initialization for low-rank projections (NeurIPS 2024, arXiv:2404.02948)
- Per-step logging of baseline vs CRMA gradient norms and spectral norm
All of this runs cleanly inside SFTTrainer with a custom optimizers= tuple.
Ablation results (TinyLlama 1.1B, 200-row Alpaca, seed=42)
| Metric | LoRA only | LoRA + CRMA | Delta |
|---|---|---|---|
| Final loss | 0.1658 | 0.1651 | -0.4% |
| Peak grad norm | 12.15 | 5.75 | -52.7% |
| Mean grad norm | 2.34 | 2.07 | -11.5% |
| Spectral norm | - | 1.000000 | guaranteed <= 1 |
Mistral-7B: plain LoRA hit a catastrophic gradient spike at step 43 (gn ~263). CRMA held it at ~3.0 — 98.9% reduction.
HF Space
https://huggingface.co/spaces/Fourwheels2512/crma-fine-tuner
Would love any feedback from TRL users, especially on cleaner ways to hook into the trainer for per-step stability metrics.