Efficient and controllable generation of SDXL using T2I adapters

T2I-Adapter is an efficient plug-and-play model that provides additional guidance to a pre-trained text-to-image model while freezing the original large text-to-image model. The T2I adapter coordinates the internal knowledge of the T2I model with external control signals. You can train different adapters according to different conditions to achieve rich control and editing effects.

A contemporary work, ControlNet, provides similar functionality and is widely used. However, doing so can be computationally expensive. This is because both ControlNet and UNet need to be executed during each denoising step of the despreading process. Additionally, ControlNet emphasizes the importance of copying the UNet encoder as a control model, which increases the number of parameters. Therefore, generation is bottlenecked by the size of the ControlNet (the larger it is, the slower the process is).

T2I adapters give ControlNet a competitive advantage in this regard. The T2I adapter is small in size, and unlike ControlNet, the T2I adapter runs only once during the entire denoising process.

Model Type Model Parameter Storage (fp16) ControlNet-SDXL 1251 M 2.5 GB ControlLoRA (Rank 128) 197.78 M (84.19% reduction) 396 MB (84.53% reduction) T2I-Adapter-SDXL 79 M (93.69% reduction) 158 MB (94% reduction)

Over the past few weeks, the Diffuser team and T2I adapter creators have been working together to enable T2I adapter support for Diffuser’s Stable Diffusion XL (SDXL). In this blog post, I’ll share some fascinating results from training a T2I adapter from scratch on SDXL, and of course checkpoints for the T2I adapter on various conditionings (sketch, canny, line art, depth, open pose).

Compared to the previous version of T2I-Adapter (SD-1.4/1.5), T2I-Adapter-SDXL still uses the original recipe and drives 2.6B SDXL with 79M adapter. T2I-Adapter-SDXL inherits the high-quality sound generation SDXL while maintaining powerful control capabilities!

Training T2I Adapter SDXL with Diffuser

We built a training script based on this official sample provided by Diffuser.

Most of the T2I adapter models mentioned in this blog post were trained on 3M high-resolution image and text pairs in LAION-Aesthetics V2 with the following settings:

Training steps: 20000-35000 Batch size: Data-parallel with single GPU batch size 16, total batch size 128. Learning rate: Constant learning rate of 1e-5. Mixed precision: fp16

We encourage the community to use our scripts to train custom, powerful T2I adapters to achieve competitive tradeoffs between speed, memory, and quality.

Using the T2I adapter SDXL with a diffuser

Here, we take line drawing conditions as an example to demonstrate the usage of T2I-Adapter-SDXL. To get started, first install the required dependencies.

pip install -U git+https://github.com/huggingface/diffusers.git pip install -U controlnet_aux==0.0.7 pip install Transformer acceleration

The T2I-Adapter-SDXL generation process consists of two main steps:

Condition images are first prepared into the appropriate control image format. The control image and prompt are passed to the StableDiffusionXLAdapterPipeline.

Let’s look at a simple example using a lineart adapter. First, initialize the T2I adapter pipeline and line art detector for SDXL.

import torch
from controlnet_aux.lineart import line art detector
from diffuser import (AutoencoderKL, EulerAncestralDiscreteScheduler, StableDiffusionXLAdapterPipeline, T2IAdapter)
from Diffuser.Utility import load_image, make_image_grid adapter = T2IAdapter.from_pretrained(
“TencentARC/t2i-adapter-lineart-sdxl-1.0”torch_dtype=torch.float16, variant=“FP16”
). To (“Cuda”) Model ID = “stabilityai/stable-diffusion-xl-base-1.0”
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder=“Scheduler”
) vae = AutoencoderKL.from_pretrained(
“madebyollin/sdxl-vae-fp16-fix”torch_dtype=torch.float16 ) Pipe = StableDiffusionXLAdapterPipeline.from_pretrained(model_id, vae=vae,adapter=adapter,scheduler=euler_a, torch_dtype=torch.float16,variant=“FP16”). To (“Cuda”) line_detector = LineartDetector.from_pretrained(“llyasviel/annotator”). To (“Cuda”)

Next, load the image and detect the line drawing.

URL = “https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SXLV1.0/org_lin.jpg”
image = load_image(url) image = line_detector(image, detect_resolution=384image resolution =1024)

Then generate:

prompt = “Ice Dragon’s Roar, 4K Photo”
negative prompt = “anime, manga, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, deformed”
gen_images = Pipe( prompt=prompt, negative_prompt=negativeprompt, image=image, num_inference_steps=30adapter_conditioning_scale=0.8guidance_scale=7.5).images(0) gen_images.save(“out_lin.png”)

There are two important arguments to understand that will help you control the amount of conditioning.

Adapter_Conditioning_Scale

This argument controls how much the conditioning affects the input. The higher the value, the greater the conditioning effect, and vice versa.

adapter conditioning factor

This argument controls the number of initial generation steps in which conditioning is applied. The value must be between 0 and 1 (default is 1). The value adapter_conditioning_factor=1 means that the adapter is applied to all time steps, adapter_conditioning_factor=0.5 means that the adapter is applied only to the first 50% of steps.

Please check the official documentation for more information.

Try the demo

You can easily try out T2I-Adapter-SDXL in this space or in the playground embedded below.

You can also try Doodly. Doodly is built using sketch models that turn doodles into realistic images (with language monitoring).

further results

Below we show the results obtained using different types of conditions. We also supplement the results with links to the corresponding pre-trained checkpoints. The model card contains usage examples as well as details about the training method.

With line drawing guide

TencentARC/t2i-adapter-lineart-sdxl-1.0 model

guided sketch

TencentARC/t2i-adapter-sketch-sdxl-1.0 model

With canny guide

TencentARC/t2i-adapter-canny-sdxl-1.0 model

with depth guide

Depth guide models for TencentARC/t2i-adapter-Depth-midas-sdxl-1.0 and TencentARC/t2i-adapter-Depth-zoe-sdxl-1.0 respectively

OpenPose guided

TencentARC/t2i-adapter-openpose-sdxl-1.0 model

Acknowledgments: Many thanks to William Berman for helping train the model and sharing his insights.

versatileai

See Full Bio

What's Hot

Efficient and controllable generation of SDXL using T2I adapters

Huawei’s agent AI drives industrial automation

Microsoft announces first in-house text-to-image conversion model MAI-Image-1

Huawei’s agent AI drives industrial automation

Comparing SafeCoder and Closed Source Code Assistant

Vibe analytics to easily uncover data insights

Corteva, Profluent partners use AI to enable more resilient crops

Professor leads AI innovation in $11 million vaccine research

Adds AI tools for on-demand video creation to Google TV sets

Most Popular

Corteva, Profluent partners use AI to enable more resilient crops

Professor leads AI innovation in $11 million vaccine research

Adds AI tools for on-demand video creation to Google TV sets

Don't Miss

Efficient and controllable generation of SDXL using T2I adapters

Huawei’s agent AI drives industrial automation

Microsoft announces first in-house text-to-image conversion model MAI-Image-1

Subscribe to Updates

What's Hot

Efficient and controllable generation of SDXL using T2I adapters

Training T2I Adapter SDXL with Diffuser

Using the T2I adapter SDXL with a diffuser

Try the demo

further results

With line drawing guide

guided sketch

With canny guide

with depth guide

OpenPose guided

Related Posts