Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Lingjing AI receives tens of millions of dollars in Angel + Round funding from Guoke Investment to accelerate AI anime mass

October 12, 2025

Fast diffusion for image generation

October 12, 2025

US Senate passes GAIN Act to prioritize domestic AI and HPC chip sales — TradingView

October 11, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Sunday, October 12
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»Fast diffusion for image generation
Tools

Fast diffusion for image generation

versatileaiBy versatileaiOctober 12, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

What is Würstchen?

Würstchen is a diffusion model, and its text conditional component operates in a highly compressed latent space of images. Why is this important? Compressing data can reduce computational costs for both training and inference by orders of magnitude. Training on 1024×1024 images is much more expensive than training on 32×32. Other works typically use relatively small compressions ranging from 4x to 8x spatial compression. Würstchen takes this to the extreme. The innovative design achieves 42 times more space compression! This has never been seen before, as common methods cannot faithfully reconstruct detailed images after 16x spatial compression. Würstchen employs two stages of compression called Stage A and Stage B. Stage A is a VQGAN and stage B is a diffuse autoencoder (details are provided in the paper). Stages A and B are collectively called the decoder because they decode the compressed image back into pixel space. The third model, Stage C, is trained in its highly compressed latent space. This training requires some of the compute used by today’s best-performing models, but also enables cheaper and faster inference. Stage C is called the pre-stage.

Wurstchen images with prompts

Why a separate text-to-image model?

Well, this is pretty fast and efficient. Würstchen’s biggest advantage lies in the fact that it can generate images much faster than models such as Stable Diffusion XL, while also using significantly less memory. So for those who don’t have an A100, this will come in handy. Below is a comparison with SDXL at various batch sizes.

Inference speed plot

In addition to that, another big advantage of Würschen is reduced training costs. Würstchen v1 ran at 512×512 and required only 9,000 hours of GPU time to train. Comparing this to the 150,000 GPU hours spent on Stable Diffusion 1.4 suggests that this 16x cost reduction not only benefits researchers when conducting new experiments, but also opens the door for more organizations to train such models. Würstchen v2 used 24,602 GPU hours. Even when the resolution goes up to 1536, it is still 6x cheaper than SD1.4, which was only trained on 512×512.

Inference speed plot

A detailed instructional video is also available here.

How to use Würstchen?

You can try it out using the demo here.

Otherwise, models are available from the Diffuser Library, so you can use a familiar interface. For example, here’s how to perform inference using AutoPipeline:

import torch
from diffuser import AutoPipelineForText2Image
from Diffuser.Pipeline.Wurstchen import DEFAULT_STAGE_C_TIMESTEPS pipeline = AutoPipelineForText2Image.from_pretrained(“Warp Eye/Verstchen”torch_dtype=torch.float16).to(“Cuda”) caption = “Anthropomorphic cat dressed as a firefighter”
image = pipeline( caption, height =1024width =1536prior_timesteps=DEFAULT_STAGE_C_TIMESTEPS, prior_guidance_scale=4.0number of images per prompt =4,).image

Anthropomorphic cat dressed as a firefighter

What image sizes does Würschen support?

Würstchen was trained on image resolutions from 1024×1024 to 1536×1536. Resolutions such as 1024×2048 may also provide good output. Please try it. We also observed that Prior (stage C) adapts to the new resolution very quickly. Therefore, fine-tuning at 2048×2048 should be less computationally expensive.

model on hub

All checkpoints can also be found on the Huggingface Hub. There are multiple checkpoints, as well as weights for future demos and models. There are currently three checkpoints available for Prior and one checkpoint for Decoder. See the documentation that describes checkpoints and the documentation that describes the overview and availability of various previous models.

Diffuser integration

Since Würstchen is fully integrated into the diffuser, it automatically provides a wide range of out-of-the-box functions and optimizations. These include:

As explained below, automatic use of PyTorch 2 SDPA increased alertness. 2. Support for xFormers flush attention implementation if you need to move unused components to the CPU using PyTorch 1.x instead of model offloading. This saves memory with negligible performance impact. Sequential CPU offloading for situations where memory is at a premium. Memory usage is minimal, but inference is slower. Instant weighting using the Compel library. Support for mps devices on Apple Silicon Macs. Use a generator to increase reproducibility. Sensible defaults for inference produce high-quality results in most situations. Of course, you can fine-tune all parameters to your needs.

Optimization Technique 1: Flash Attention

Starting with version 2.0, PyTorch has integrated a highly optimized and resource-friendly version of the attention mechanism called torch.nn.function.scaled_dot_product_attention (SDPA). Depending on the nature of the input, this function utilizes several fundamental optimizations. Its performance and memory efficiency outperform traditional attention models. Notably, the SDPA feature reflects the characteristics of flash attention technology, as highlighted in the research paper “Fast and Memory-Efficient Exact Attending with IO-Awareness” authored by Dao and his team.

If you are using Diffuser with PyTorch 2.0 or later versions and have access to SDPA features, these extensions will be applied automatically. Get started by setting up torch 2.0 or later versions using the official guidelines.

image = pipeline(caption, height =1024width =1536prior_timesteps=DEFAULT_STAGE_C_TIMESTEPS, prior_guidance_scale=4.0number of images per prompt =4).image

To learn more about how diffusers utilize SDPA, check out our documentation.

If you are using a version of Pytorch earlier than 2.0, you can still achieve memory-efficient attention using the xFormers library.

Pipeline.enable_xformers_memory_efficient_attention()

Optimization technique 2: Torch compilation

If you want even more performance, you can use torch.compile. For maximum performance, it is best to apply it to both the previous model and the main model of the decoder.

Pipeline.prior_prior = torch.compile(pipeline.prior_prior , mode=“Reducing overhead”full graph =truth) pipeline.decoder = torch.compile(pipeline.decoder, mode=“Reducing overhead”full graph =truth)

Note that during model compilation, the first inference step takes a long time (up to 2 minutes). You can then perform inference as usual.

image = pipeline(caption, height =1024width=1536prior_timesteps=DEFAULT_STAGE_C_TIMESTEPS, prior_guidance_scale=4.0number of images per prompt =4).image

And the good news is that this compilation is a one-time run. You can then experience consistently fast inference at the same image resolution. The initial investment in compilation is quickly offset by subsequent speed benefits. If you want to learn more about torch.compile and its nuances, check out the official documentation.

How was the model trained?

The ability to train this model was only possible through the computing resources provided by Stability AI. We would like to give a special thanks to Stability for giving us the opportunity to make this type of research available to even more people.

resource

For more information about this model, please refer to the official diffuser documentation. All checkpoints are located at the hub. You can try the demo here. If you’d like to discuss upcoming projects or contribute your own ideas, please join us on Discord. Training code and more can be found in the official GitHub repository.

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleUS Senate passes GAIN Act to prioritize domestic AI and HPC chip sales — TradingView
Next Article Lingjing AI receives tens of millions of dollars in Angel + Round funding from Guoke Investment to accelerate AI anime mass
versatileai

Related Posts

Tools

Despite surging investment, AI value remains elusive

October 11, 2025
Tools

Cisco AI Routers Solve Data Center Interconnect Challenges

October 10, 2025
Tools

Google aims to put an AI agent on every desk

October 9, 2025
Add A Comment

Comments are closed.

Top Posts

3D Gaussian Splatting Overview

October 9, 20252 Views

Introducing the Gemini 2.5 computer usage model

October 8, 20252 Views

Republicans should make national AI moratorium part of funding negotiations

October 6, 20252 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

3D Gaussian Splatting Overview

October 9, 20252 Views

Introducing the Gemini 2.5 computer usage model

October 8, 20252 Views

Republicans should make national AI moratorium part of funding negotiations

October 6, 20252 Views
Don't Miss

Lingjing AI receives tens of millions of dollars in Angel + Round funding from Guoke Investment to accelerate AI anime mass

October 12, 2025

Fast diffusion for image generation

October 12, 2025

US Senate passes GAIN Act to prioritize domestic AI and HPC chip sales — TradingView

October 11, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?