Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

AI Art Generation Using Primo Models: Unlock Creative Business Opportunities in 2024 | AI News Details

July 5, 2025

Benchmarks for speech models from wild text

July 5, 2025

Creating innovative content at your fingertips

July 4, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Saturday, July 5
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Diffusers welcome stable spread 3
Tools

Diffusers welcome stable spread 3

versatileaiBy versatileaiApril 26, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

Stable Diffusion 3 (SD3), the latest iteration of models in the stable diffusion family of stable AI, is now available on hugging face hubs and can be used with diffusers.

The model released today is a stable diffusion 3 medium with 2B parameters.

As part of this release, we provided:

Hub Diffuser Integrated SD3 Dream Booth and Lora Training Script Model

table of contents

What’s new in SD3?

Model

SD3 is a potential diffusion model consisting of three different text encoders (Clip L/14, OpenCLip Bigg/14, and T5-V1.1-XXL), a new multimodal diffusion transformer (MMDIT) model, and a 16-channel automatic encoder model similar to stable diffusion XL.

SD3 handles text input and pixel latency as an embedded sequence. Position encoding is added to the 2×2 patch of latent material and is then flattened into a patch encoding sequence. This sequence is fed into the MMDIT block along with a text-encoded sequence, embedded in a common dimension, concatenated and passed through modulated attention and MLP sequences.

To explain the difference between the two modalities, the MMDIT block uses two separate sets of weights to embed text and image sequences in a common dimension. These sequences are combined before the attention operation. This allows both expressions to behave in their own spaces, while taking into account other expressions during attention operations. This bidirectional flow of information between text and image data differs from previous approaches for text-to-image composition. Text information is embedded into the potential through mutual participation using fixed textual representations.

SD3 uses pooled text embeddings from both clip models as part of the time step conditioning. These embeddings are first concatenated and added to the timestep embedding before they are passed to each MMDIT block.

Training with Modified Flow Matching

In addition to architectural changes, SD3 trains the model by applying conditional flow matching goals. In this approach, the forward noise process is defined as a rectifying flow that connects the data and noise distributions by a linear connection.

The modified flow matching sampling process is simpler and works well with reduced number of sampling steps. To support inference in SD3, we have introduced a new scheduler (FlowMatcheulerDiscreTeScheduler) with modified flow matching formulation and Euler method steps. It also implements resolution-dependent shifts of time step schedules via shift parameters. Increasing the shift value will handle noise scaling properly for higher resolution. We recommend using Shift = 3.0 for your 2B model.

To quickly try out SD3, see the following applications:

Use SD3 with Diffusers

To use SD3 with Diffusers, upgrade to the latest Diffusers release.

PIP Installation – Upgrade the Diffuser

Because the model is gated, you must first move to a stable diffusion 3 medium embracing face page before using it with a diffuser. Fill out the form to accept the gate. Once you’re in, you’ll need to log in so that you know the system has accepted the gate. Log in using the following command:

Huggingface-Cli Login

The following snippet downloads the 2B parameter version of SD3 to FP16 Precision. This is the format used by the original checkpoint issued by Stability AI, and is the recommended way to perform inference.

From text to image

Import torch
from Diffuser Import stablediffusion3pipelinepipe=stablediffusion3pipeline.from_pretrained(
“stabilityai/stable-diffusion-3-medium-diffusers”torch_dtype = torch.float16).to(“cuda”) Image = Pipe (
“A cat holding a sign called HelloWorld”,negial_prompt =“”num_inference_steps =28,Guidance_scale =7.0,).images(0) image

From image to image

Import torch
from Diffuser Import stablediffusion3img2imgpipeline
from diffusers.utils Import load_image pipe = stablediffusion3img2imgpipeline.from_pretrained(
“stabilityai/stable-diffusion-3-medium-diffusers”torch_dtype = torch.float16).to(“cuda”)init_image = load_image(“https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png”)prompt = “Cat Wizard, Gandalf, Lord of the Rings, Details, Fantasy, Cute, Adorable, Pixar, Disney, 8K”
image = pipe(prompt, image = init_image).images(0) image

wizard_cat

See the SD3 documentation here.

SD3 memory optimization

The SD3 uses three text encoders, one of which is a very large T5-XXL model. This allows you to run your model on a GPU with less than 24GB of VRAM, even when using FP16 accuracy.

To illustrate this, Diffusers integration features memory optimizations that allow SD3 to run on a wider range of devices.

Perform inference on offloading models

The most basic memory optimizations available in the diffuser allow you to offload the model’s components to the CPU during inference, saving memory while slightly increasing the latency of inference. Model offload moves model components to the GPU only if they need to run while keeping the remaining components in the CPU.

Import torch
from Diffuser Import stablediffusion3pipelinepipe=stablediffusion3pipeline.from_pretrained(
“stabilityai/stable-diffusion-3-medium-diffusers”torch_dtype = torch.float16)pipe.enable_model_cpu_offload()prompt = “The smiling cartoon dog is sitting on a table, a coffee mug as the room is on fire. “This is fine,” asserts the dog itself. ”
image = pipe(prompt).images(0))

Remove T5 text encoder during inference

Removing the memory-intensive 4.7B parameter T5-XXL text encoder during inference can significantly reduce the memory requirements of SD3 with slight loss of performance.

Import torch
from Diffuser Import stablediffusion3pipelinepipe=stablediffusion3pipeline.from_pretrained(
“stabilityai/stable-diffusion-3-medium-diffusers”,text_encoder_3 =nonetokenizer_3 =nonetorch_dtype = torch.float16).to(“cuda”)prompt = “The smiling cartoon dog is sitting on a table, a coffee mug as the room is on fire. “This is fine,” asserts the dog itself. ”
image = pipe(prompt).images(0))

Uses quantized version of the T5-XXL model

You can use the BitsandBytes library to load your T5-XXL models to 8 bits to further reduce your memory requirements.

Import torch
from Diffuser Import stablediffusion3pipeline
from transformer Import t5encodermodel, bitsandbytesconfig quantization_config = bitsandbytesconfig(load_in_8bit =truth)model_id = “stabilityai/stable-diffusion-3-medium-diffusers”
text_encoder = t5encodermodel.from_pretrained(model_id, subfolder =“text_encoder_3”,Quantization_config =Quantization_config, )pipe = stablediffusion3pipeline.from_pretrained(model_id,text_encoder_3 =text_encoder,device_map =“balance”torch_dtype = torch.float16)

You can find the complete code snippet here.

Memory optimization overview

All benchmark runs were conducted using the 2B version of the SD3 model of the A100 GPU with 80GB of VRAM using FP16 Precision and Pytorch 2.3.

Memory benchmarks use three iterations of pipeline calls to report the average inference time for 10 iterations of pipeline calls. Uses the default arguments for the stablediffusion3pipeline __call __() method.

Technical Inference Time (SECS) Memory (GB) Default 4.762 18.765 Offload 32.765 (~6.8X🔼) 12.0645 (~1.55X🔽) Offload + no T5 19.110 (~4.013x🔼) 4.266 (~4.398x) 8-bit T5 4.932 (~1.036x (~1.77x🔽)

Performance Optimization for SD3

To increase the delay in inference, torch.compile() can be used to obtain optimized computational graphs of VAE and transformer components.

Import torch
from Diffuser Import stablediffusion3pipeline torch.set_float32_matmul_precision(“expensive”) torch._inductor.config.conv_1x1_as_mm = truth
torch._inductor.config.coordinate_descent_tuning = truth
torch._inductor.config.epilogue_fusion = error
torch._inductor.config.coordinate_descent_check_all_directions = truth

pipe = stablediffusion3pipeline.from_pretrained(
“stabilityai/stable-diffusion-3-medium-diffusers”torch_dtype = torch.float16).to(“cuda”)pipe.set_progress_bar_config(disable =truth)pipe.transformer.to(memory_format = torch.channels_last)pipe.vae.to(memory_format = torch.channels_last)pipe.transformer = torch.compile(pipe.transformer, mode =“Max-Autotune”fullgraph =truth)pipe.vae.decode = Torch.compile(pipe.vae.decode, mode =“Max-Autotune”fullgraph =truth)prompt = “Photo of a cat holding a sign saying HelloWorld”,
for _ in range(3): _ = pipe(prompt = prompt, generator = torch.manual_seed(1)) image = pipe(prompt = prompt, generator = torch.manual_seed(1). image(0)image.save(“sd3_hello_world.png”))

For the complete script, see here.

I benchmarked the performance of Torch.comPile() on SD3 on a single 80GB A100 machine using FP16 Precision and Pytorch 2.3. I performed 10 iterations of pipeline inference calls with 20 diffusion steps. We found that the average inference time using the compiled version of the model was 0.585 seconds, four times faster than enthusiastic execution.

Fine adjustments to Dream Booth and Lora

Additionally, it offers a DreamBooth tweak script for SD3 that leverages LORA. The script can be used to efficiently fine-tune SD3 and acts as a reference to implement modified flow-based training pipelines. Other common implementations of fix flows include MinRF.

To start the script, first make sure that the appropriate setup and demo dataset are available (for example, this). For more information, see here. You can do so by installing PEFT and BITSANDBYTES.

export model_name =“stabilityai/stable-diffusion-3-medium-diffusers”
export instance_dir =“dog”
export output_dir =“dreambooth-sd3-lora”

Accelerate raunch train_dreambooth_lora_sd3.py \ -pretrained_model_name_or_path =${model_name} \ -instance_data_dir =${instance_dir} \ -output_dir =/raid/.cache/${output_dir} \ –mixed_precision =“FP16” \ –instance_prompt =“SKS dog photos” \ -Resolution = 1024 \ –Train_Batch_size = 1 \ –Gradient_accumulation_Steps = 4 \ – Learning_rate = 1e-5 \ –Report_to =“Wan Dob” \ -lr_scheduler =“Constant” \ -lr_warmup_steps = 0 \ -max_train_steps = 500 \ -weighting_scheme =“logit_normal” \ -validation_prompt =“Photo of a sks dog in a bucket” \ -validation_epochs = 25 \ -seed =“0” \ -PUSH_TO_HUB

Acknowledgments

We would like to thank the Stability AI team for achieving stable spread and providing early access. Thank you to Linoy for helping with the blog post thumbnail.

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous Article7 AI tools to help you build a one-person business – Make money while you sleep
Next Article Supercharge your CX using an AI agent
versatileai

Related Posts

Tools

Benchmarks for speech models from wild text

July 5, 2025
Tools

The UK and Singapore form an alliance to guide AI into finance

July 4, 2025
Tools

StarCoder2 and Stack V2

July 4, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

June 2, 20251 Views

Presight plans to expand its AI business internationally

April 14, 20251 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

June 2, 20251 Views

Presight plans to expand its AI business internationally

April 14, 20251 Views
Don't Miss

AI Art Generation Using Primo Models: Unlock Creative Business Opportunities in 2024 | AI News Details

July 5, 2025

Benchmarks for speech models from wild text

July 5, 2025

Creating innovative content at your fingertips

July 4, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?