T2I-Adapter is an efficient plug-and-play model that provides additional guidance to a pre-trained text-to-image model while freezing the original large text-to-image model. The T2I adapter coordinates the internal knowledge of the T2I model with external control signals. You can train different adapters according to different conditions to achieve rich control and editing effects.
A contemporary work, ControlNet, provides similar functionality and is widely used. However, doing so can be computationally expensive. This is because both ControlNet and UNet need to be executed during each denoising step of the despreading process. Additionally, ControlNet emphasizes the importance of copying the UNet encoder as a control model, which increases the number of parameters. Therefore, generation is bottlenecked by the size of the ControlNet (the larger it is, the slower the process is).
T2I adapters give ControlNet a competitive advantage in this regard. The T2I adapter is small in size, and unlike ControlNet, the T2I adapter runs only once during the entire denoising process.
Model Type Model Parameter Storage (fp16) ControlNet-SDXL 1251 M 2.5 GB ControlLoRA (Rank 128) 197.78 M (84.19% reduction) 396 MB (84.53% reduction) T2I-Adapter-SDXL 79 M (93.69% reduction) 158 MB (94% reduction)
Over the past few weeks, the Diffuser team and T2I adapter creators have been working together to enable T2I adapter support for Diffuser’s Stable Diffusion XL (SDXL). In this blog post, I’ll share some fascinating results from training a T2I adapter from scratch on SDXL, and of course checkpoints for the T2I adapter on various conditionings (sketch, canny, line art, depth, open pose).
Compared to the previous version of T2I-Adapter (SD-1.4/1.5), T2I-Adapter-SDXL still uses the original recipe and drives 2.6B SDXL with 79M adapter. T2I-Adapter-SDXL inherits the high-quality sound generation SDXL while maintaining powerful control capabilities!
Training T2I Adapter SDXL with Diffuser
We built a training script based on this official sample provided by Diffuser.
Most of the T2I adapter models mentioned in this blog post were trained on 3M high-resolution image and text pairs in LAION-Aesthetics V2 with the following settings:
Training steps: 20000-35000 Batch size: Data-parallel with single GPU batch size 16, total batch size 128. Learning rate: Constant learning rate of 1e-5. Mixed precision: fp16
We encourage the community to use our scripts to train custom, powerful T2I adapters to achieve competitive tradeoffs between speed, memory, and quality.
Using the T2I adapter SDXL with a diffuser
Here, we take line drawing conditions as an example to demonstrate the usage of T2I-Adapter-SDXL. To get started, first install the required dependencies.
pip install -U git+https://github.com/huggingface/diffusers.git pip install -U controlnet_aux==0.0.7 pip install Transformer acceleration
The T2I-Adapter-SDXL generation process consists of two main steps:
Condition images are first prepared into the appropriate control image format. The control image and prompt are passed to the StableDiffusionXLAdapterPipeline.
Let’s look at a simple example using a lineart adapter. First, initialize the T2I adapter pipeline and line art detector for SDXL.
import torch
from controlnet_aux.lineart import line art detector
from diffuser import (AutoencoderKL, EulerAncestralDiscreteScheduler, StableDiffusionXLAdapterPipeline, T2IAdapter)
from Diffuser.Utility import load_image, make_image_grid adapter = T2IAdapter.from_pretrained(
“TencentARC/t2i-adapter-lineart-sdxl-1.0”torch_dtype=torch.float16, variant=“FP16”
). To (“Cuda”) Model ID = “stabilityai/stable-diffusion-xl-base-1.0”
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder=“Scheduler”
) vae = AutoencoderKL.from_pretrained(
“madebyollin/sdxl-vae-fp16-fix”torch_dtype=torch.float16 ) Pipe = StableDiffusionXLAdapterPipeline.from_pretrained(model_id, vae=vae,adapter=adapter,scheduler=euler_a, torch_dtype=torch.float16,variant=“FP16”). To (“Cuda”) line_detector = LineartDetector.from_pretrained(“llyasviel/annotator”). To (“Cuda”)
Next, load the image and detect the line drawing.
URL = “https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SXLV1.0/org_lin.jpg”
image = load_image(url) image = line_detector(image, detect_resolution=384image resolution =1024)
Then generate:
prompt = “Ice Dragon’s Roar, 4K Photo”
negative prompt = “anime, manga, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, deformed”
gen_images = Pipe( prompt=prompt, negative_prompt=negativeprompt, image=image, num_inference_steps=30adapter_conditioning_scale=0.8guidance_scale=7.5).images(0) gen_images.save(“out_lin.png”)
There are two important arguments to understand that will help you control the amount of conditioning.
Adapter_Conditioning_Scale
This argument controls how much the conditioning affects the input. The higher the value, the greater the conditioning effect, and vice versa.
adapter conditioning factor
This argument controls the number of initial generation steps in which conditioning is applied. The value must be between 0 and 1 (default is 1). The value adapter_conditioning_factor=1 means that the adapter is applied to all time steps, adapter_conditioning_factor=0.5 means that the adapter is applied only to the first 50% of steps.
Please check the official documentation for more information.
Try the demo
You can easily try out T2I-Adapter-SDXL in this space or in the playground embedded below.
You can also try Doodly. Doodly is built using sketch models that turn doodles into realistic images (with language monitoring).
further results
Below we show the results obtained using different types of conditions. We also supplement the results with links to the corresponding pre-trained checkpoints. The model card contains usage examples as well as details about the training method.
With line drawing guide
TencentARC/t2i-adapter-lineart-sdxl-1.0 model
guided sketch
TencentARC/t2i-adapter-sketch-sdxl-1.0 model
With canny guide
TencentARC/t2i-adapter-canny-sdxl-1.0 model
with depth guide
Depth guide models for TencentARC/t2i-adapter-Depth-midas-sdxl-1.0 and TencentARC/t2i-adapter-Depth-zoe-sdxl-1.0 respectively
OpenPose guided
TencentARC/t2i-adapter-openpose-sdxl-1.0 model
Acknowledgments: Many thanks to William Berman for helping train the model and sharing his insights.