think beyond lora for fine-tuning

source: hugging face blog: beyond lora: can you beat the most popular fine-tuning technique?

level: technical

parameter-efficient fine-tuning, or peft, cuts memory use when adapting open models. lora is the most popular method, appearing in over 98% of peft model cards on hugging face hub. its dominance may stem from early adoption and wide support rather than superior performance. researchers often claim their new techniques beat lora, but such results can be biased by uneven hyperparameter tuning or cherry-picked benchmarks.

hugging face built benchmarks to compare peft methods fairly on the same hardware, data, and code. for a math reasoning task with llama-3.2-3b, lora variants reached 53.2% accuracy using 22.6 gb vram, but beft used only 20.2 gb with 32.9% accuracy, and lily hit 54.9% with 25.6 gb. on an image generation task with flux.2-klein-base-4b, oft achieved higher similarity (0.708 vs 0.697) and lower memory (9.01 gb vs 9.97 gb) than lora, strictly dominating it.

choosing a peft method involves tradeoffs beyond accuracy and memory, like runtime, checkpoint size, or quantization support. the peft library now lets you convert non-lora adapters to lora format for serving in tools like vllm, with minimal quality loss. users can contribute new experiments or benchmarks to the project, making it easier to find the right technique for their own data and constraints.

why it matters: testing multiple peft methods can yield better accuracy or lower memory than defaulting to lora, directly improving model fine-tuning efficiency.

source: hugging face blog: beyond lora: can you beat the most popular fine-tuning technique?