Finely tune LLM to make it twice as fast, and Unsloth and 🤗TRL

Pull your hair out as LLM tweaks are being taken forever? In this post, we’ll show you some lightweight tools developed by the community to make LLM tweaks so fast!

Before diving into Unsloth, you may be reading Qlora’s blog posts or familiar with LLM tweaks using the LLM PEFT library.

Unsloth -2x faster, -40% memory usage, 0% accuracy degradation

Unsloth is a lightweight library for faster LLM tweaks, fully compatible with the hugging hugging Face Ecosystem (Hub, Transformers, PEFT, TRL). The library is actively developed by the Unsloth team (Daniel and Michael) and the open source community. The library supports most NVIDIA GPUs from the GTX 1070 to the H100S and can be used throughout the trainer suite of TRL libraries (SFTTrainer, Dpotrainer, Ppotrainer). At the time of writing, Unsloth supports Llama (Codellama, Yi, etc.) and mistral architectures.

Unsloth works by overwriting some of the modeling code with optimized operations. By manually deriving backpropagation steps and rewriting all Pytorch modules to the Triton kernel, Unsloth can reduce memory usage and make tweaks faster. Importantly, the accuracy degradation for normal Qlora is 0% because the optimized code does not perform an approximation.

benchmark

1 A100 40GB Dataset 1.1b Alpaca 1x 1.55x 2.74x -57.8%dpo with Zephyr ultrachat 1x 1.24x 1.88x -11.6%

Free Colab T4 Data Set Zephyr Ultra Chat 1x 1.09x 1.55x -18.6%

Unsloth was benchmarked over 59 runs using four datasets of Tesla T4 and A100 Google Colab instances. Qlora was applied to all linear layers (caution and MLP) with 16 ranks, with gradient checkpoints turned on. With Pytorch 2.1.1, Unsloth is up to 2.7 times faster and uses up to 74% less memory by testing against the latest Transformers version (4.36) with SDPA integrated natively. I also tested Unsloth on a free Google Colab instance (Low RAM, 1 T4 GPU, Pytorch 2.1.0 CUDA 12.1). All 59 notebooks are provided to provide complete reproducibility, with details listed in Unsloth’s benchmark details.

How do I use Unsloth?

Simply load the Model with FastLanguageModel.from_pretrained! Currently, Unsloth supports Llama and Mistral Type architectures (YI, Deepseek, Tinyllama, Llamafed Qwen). If you want others, open up github issues! The latest Transformers main branch also allows you to directly load pre-quantified 4-bit models. This will make model downloads four times faster and reduce memory fragmentation by about 500MB. This allows for larger batches to be adapted. We have a few pre-quantized models for your convenience, including unsloth/llama-2-7b-bnb-4bit, unsloth/llama-2-13b-bnb-4bit, unsloth/mistral-7b-bnb-4bit and unsloth/codellama-34b-bnb-4bit.

You must provide the intended maximum sequence length for from_pretrained. Unsloth performs rope scaling internally, so it is automatically supported when the maximum sequence length is large. Otherwise, the API is almost the same as from_pretrained in Trans, except that fastlanguageModel.from_pretrained also returns a model token agent for convenience.

from I can’t sleep Import FastLanguageModel Model, tokenizer = fastLanguageModel.from_pretrained(model_name = “UNSLOTH/MISTRAL-7B-BNB-4BIT”,max_seq_length = 2048,load_in_4bit = truth,)

Once the model is loaded, install the adapter using fastLanguageModel.get_peft_model and perform the qlora fine tuning.

Model = fastLanguageModel.get_peft_model(model,r = 16target_modules=(“Q_Proj”, “k_proj”, “V_Proj”, “O_Proj”, “gate_proj”, “up_proj”, “down_proj”), lora_alpha = 16,lora_dropout = 0bias = “none”,use_gradient_checkpointing = truth,)

Once the adapter is connected, you can use the model directly within any class from the HF ecosystem, such as SFTTrainer in TRL.

Unsloth + TRL Integration

To use Unsloth in your TRL library, simply pass the Unsloth model to Sfttrainer or dpotrainer! The trained model is fully compatible with the hugging face ecosystem, so push the final model into the hub and use a transformer to get out of the box!

Import torch

from TRL Import sfttrainer
from transformer Import Training Argu
from Dataset Import load_dataset

from I can’t sleep Import fastlanguageModel max_seq_length = 2048

dataset = load_dataset(“IMDB”split =“train”) model, tokenizer = fastlanguageModel.from_pretrained(model_name = “UNSLOTH/MISTRAL-7B-BNB-4BIT”max_seq_length = max_seq_length, dtype = none,load_in_4bit = truth,)Model = fastLanguageModel.get_Peft_Model(Model,r = 16target_modules=(“Q_Proj”, “k_proj”, “V_Proj”, “O_Proj”,
“gate_proj”, “up_proj”, “down_proj”,), lora_alpha = 16,lora_dropout = 0bias = “none”,use_gradient_checkpointing = truthrandom_state = 3407max_seq_length = max_seq_length, )trainer = sfttrainer(model = model, train_dataset = dataset, dataset_text_field = “Sentence”max_seq_length = max_seq_length, tokenizer = tokenizer, args = trainingarguments(per_device_train_batch_size = 2,gradient_accumulation_steps = 4warmup_steps = 10,max_steps = 60,fp16 = do not have torch.cuda.is_bf16_supported(), bf16 = torch.cuda.is_bf16_supported(), logging_steps = 1output_dir = “output”,optimal = “adamw_8bit”seed = 3407,) )trainer.train()

Reproducible notebook

For those who want to try Unsloth with Sfttrainer on a free tier Google Colab instance, I share a fully reproducible notebook below.

Here’s an example of the llama 7b free Tesla T4 Colab

Here’s an example of the Mistral 7b free Tesla T4 Colab

Click here for an example of the Codellama 34b A100 Colab

Here’s an example of Zephyr DPO Replication T4 Colab

versatileai

See Full Bio

What's Hot

New Nvidia Blackwell Tip for China may surpass H20 models

Generate images with Claude and Hugging Face

Pennsylvania legislators are trying to regulate AI in healthcare

New Nvidia Blackwell Tip for China may surpass H20 models

Generate images with Claude and Hugging Face

Introducing the Gemma 3 270M: A compact model of ultra-efficient AI

The UAE announces bold AI-led plans to revolutionize the law

The UAE will use artificial intelligence to develop new laws

New report on national security risks from weakened AI safety frameworks

Most Popular

The UAE announces bold AI-led plans to revolutionize the law

The UAE will use artificial intelligence to develop new laws

New report on national security risks from weakened AI safety frameworks

Don't Miss

New Nvidia Blackwell Tip for China may surpass H20 models

Generate images with Claude and Hugging Face

Pennsylvania legislators are trying to regulate AI in healthcare

Subscribe to Updates

What's Hot

Finely tune LLM to make it twice as fast, and Unsloth and 🤗TRL

Unsloth -2x faster, -40% memory usage, 0% accuracy degradation

benchmark

How do I use Unsloth?

Unsloth + TRL Integration

Reproducible notebook

Related Posts