Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

New Nvidia Blackwell Tip for China may surpass H20 models

August 20, 2025

Generate images with Claude and Hugging Face

August 19, 2025

Pennsylvania legislators are trying to regulate AI in healthcare

August 19, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Wednesday, August 20
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»Finely tune LLM to make it twice as fast, and Unsloth and 🤗TRL
Tools

Finely tune LLM to make it twice as fast, and Unsloth and 🤗TRL

versatileaiBy versatileaiAugust 11, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email



Pull your hair out as LLM tweaks are being taken forever? In this post, we’ll show you some lightweight tools developed by the community to make LLM tweaks so fast!

Before diving into Unsloth, you may be reading Qlora’s blog posts or familiar with LLM tweaks using the LLM PEFT library.

Unsloth -2x faster, -40% memory usage, 0% accuracy degradation

Unsloth is a lightweight library for faster LLM tweaks, fully compatible with the hugging hugging Face Ecosystem (Hub, Transformers, PEFT, TRL). The library is actively developed by the Unsloth team (Daniel and Michael) and the open source community. The library supports most NVIDIA GPUs from the GTX 1070 to the H100S and can be used throughout the trainer suite of TRL libraries (SFTTrainer, Dpotrainer, Ppotrainer). At the time of writing, Unsloth supports Llama (Codellama, Yi, etc.) and mistral architectures.

Unsloth works by overwriting some of the modeling code with optimized operations. By manually deriving backpropagation steps and rewriting all Pytorch modules to the Triton kernel, Unsloth can reduce memory usage and make tweaks faster. Importantly, the accuracy degradation for normal Qlora is 0% because the optimized code does not perform an approximation.

benchmark

1 A100 40GB Dataset 1.1b Alpaca 1x 1.55x 2.74x -57.8%dpo with Zephyr ultrachat 1x 1.24x 1.88x -11.6%

Free Colab T4 Data Set Zephyr Ultra Chat 1x 1.09x 1.55x -18.6%

Unsloth was benchmarked over 59 runs using four datasets of Tesla T4 and A100 Google Colab instances. Qlora was applied to all linear layers (caution and MLP) with 16 ranks, with gradient checkpoints turned on. With Pytorch 2.1.1, Unsloth is up to 2.7 times faster and uses up to 74% less memory by testing against the latest Transformers version (4.36) with SDPA integrated natively. I also tested Unsloth on a free Google Colab instance (Low RAM, 1 T4 GPU, Pytorch 2.1.0 CUDA 12.1). All 59 notebooks are provided to provide complete reproducibility, with details listed in Unsloth’s benchmark details.

How do I use Unsloth?

Simply load the Model with FastLanguageModel.from_pretrained! Currently, Unsloth supports Llama and Mistral Type architectures (YI, Deepseek, Tinyllama, Llamafed Qwen). If you want others, open up github issues! The latest Transformers main branch also allows you to directly load pre-quantified 4-bit models. This will make model downloads four times faster and reduce memory fragmentation by about 500MB. This allows for larger batches to be adapted. We have a few pre-quantized models for your convenience, including unsloth/llama-2-7b-bnb-4bit, unsloth/llama-2-13b-bnb-4bit, unsloth/mistral-7b-bnb-4bit and unsloth/codellama-34b-bnb-4bit.

You must provide the intended maximum sequence length for from_pretrained. Unsloth performs rope scaling internally, so it is automatically supported when the maximum sequence length is large. Otherwise, the API is almost the same as from_pretrained in Trans, except that fastlanguageModel.from_pretrained also returns a model token agent for convenience.

from I can’t sleep Import FastLanguageModel Model, tokenizer = fastLanguageModel.from_pretrained(model_name = “UNSLOTH/MISTRAL-7B-BNB-4BIT”,max_seq_length = 2048,load_in_4bit = truth,)

Once the model is loaded, install the adapter using fastLanguageModel.get_peft_model and perform the qlora fine tuning.

Model = fastLanguageModel.get_peft_model(model,r = 16target_modules=(“Q_Proj”, “k_proj”, “V_Proj”, “O_Proj”, “gate_proj”, “up_proj”, “down_proj”), lora_alpha = 16,lora_dropout = 0bias = “none”,use_gradient_checkpointing = truth,)

Once the adapter is connected, you can use the model directly within any class from the HF ecosystem, such as SFTTrainer in TRL.

Unsloth + TRL Integration

To use Unsloth in your TRL library, simply pass the Unsloth model to Sfttrainer or dpotrainer! The trained model is fully compatible with the hugging face ecosystem, so push the final model into the hub and use a transformer to get out of the box!

Import torch

from TRL Import sfttrainer
from transformer Import Training Argu
from Dataset Import load_dataset

from I can’t sleep Import fastlanguageModel max_seq_length = 2048

dataset = load_dataset(“IMDB”split =“train”) model, tokenizer = fastlanguageModel.from_pretrained(model_name = “UNSLOTH/MISTRAL-7B-BNB-4BIT”max_seq_length = max_seq_length, dtype = none,load_in_4bit = truth,)Model = fastLanguageModel.get_Peft_Model(Model,r = 16target_modules=(“Q_Proj”, “k_proj”, “V_Proj”, “O_Proj”,
“gate_proj”, “up_proj”, “down_proj”,), lora_alpha = 16,lora_dropout = 0bias = “none”,use_gradient_checkpointing = truthrandom_state = 3407max_seq_length = max_seq_length, )trainer = sfttrainer(model = model, train_dataset = dataset, dataset_text_field = “Sentence”max_seq_length = max_seq_length, tokenizer = tokenizer, args = trainingarguments(per_device_train_batch_size = 2,gradient_accumulation_steps = 4warmup_steps = 10,max_steps = 60,fp16 = do not have torch.cuda.is_bf16_supported(), bf16 = torch.cuda.is_bf16_supported(), logging_steps = 1output_dir = “output”,optimal = “adamw_8bit”seed = 3407,) )trainer.train()

Reproducible notebook

For those who want to try Unsloth with Sfttrainer on a free tier Google Colab instance, I share a fully reproducible notebook below.

Here’s an example of the llama 7b free Tesla T4 Colab

Here’s an example of the Mistral 7b free Tesla T4 Colab

Click here for an example of the Codellama 34b A100 Colab

Here’s an example of Zephyr DPO Replication T4 Colab

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleEnd-to-end example using Vectara hallucination leaderboard
Next Article Why Hyper-Custom-Style Automation is the Future of Business Agility
versatileai

Related Posts

Tools

New Nvidia Blackwell Tip for China may surpass H20 models

August 20, 2025
Tools

Generate images with Claude and Hugging Face

August 19, 2025
Tools

Introducing the Gemma 3 270M: A compact model of ultra-efficient AI

August 19, 2025
Add A Comment

Comments are closed.

Top Posts

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20255 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20255 Views

New report on national security risks from weakened AI safety frameworks

April 22, 20255 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20255 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20255 Views

New report on national security risks from weakened AI safety frameworks

April 22, 20255 Views
Don't Miss

New Nvidia Blackwell Tip for China may surpass H20 models

August 20, 2025

Generate images with Claude and Hugging Face

August 19, 2025

Pennsylvania legislators are trying to regulate AI in healthcare

August 19, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?