Introducing Modular Diffusers – Configurable Building Blocks for Diffusion Pipelines

Modular Diffusers introduces a new way to build diffusion pipelines by composing reusable blocks. Instead of building an entire pipeline from scratch, you can combine blocks to create a workflow tailored to your needs. It complements the existing DiffusionPipeline class with a more flexible and configurable alternative.

This post explains how modular diffusers work, from the familiar API for running modular pipelines to building completely custom blocks and incorporating them into your own workflows. We’ll also show you how to integrate with Mellon, a node-based visual workflow interface that you can use to connect Modular Diffusers blocks to each other.

table of contents

quick start

Below is a simple example of how to perform inference in FLUX.2 Klein 4B using pre-built blocks.

import torch
from diffuser import ModularPipeline pipe = ModularPipeline.from_pretrained(
“black-forest-labs/FLUX.2-klein-4B”
) Pipe.load_components(torch_dtype=torch.bfloat16) Pipe.to(“Cuda”) image = pipe(prompt =“Peaceful landscape at dusk”num_inference_steps=4).images(0) Image.Save(“Output.png”)

Although it achieves the same results as the standard DiffusionPipeline, the pipeline is very different under the hood. The pipeline consists of flexible blocks (text encoding, image encoding, denoising, and decoding) that can be inspected directly.

print(pipe.blocks) Flux2KleinAutoBlocks( … Sub-blocks: (0) text_encoder (Flux2KleinTextEncoderStep) (1) vae_encoder (Flux2KleinAutoVaeEncoderStep) (2) Noise removal (Flux2KleinCoreDenoiseStep) (3) Decode (Flux2DecodeStep) )

Each block is self-contained with its own inputs and outputs. You can run any block independently as its own pipeline, and freely add, remove, and replace blocks. Blocks are dynamically reconfigured to process remaining blocks. Convert the block into an executable pipeline using .init_pipeline() and load the model weights using .load_components().

block = pipe.block textblock = block.sub_block.pop(“Text encoder”) text_pipe = text_blocks.init_pipeline(“black-forest-labs/FLUX.2-klein-4B”) text_pipe.load_components(torch_dtype=torch.bfloat16) text_pipe.to(“Cuda”) prompt_embedded = text_pipe(prompt=“Peaceful landscape at dusk”).prompt_embeds remaining pipes = block.init_pipeline(“black-forest-labs/FLUX.2-klein-4B”) remaining_pipe.load_component(torch_dtype=torch.bfloat16) remaining_pipe.to(“Cuda”) image = Remaining_pipe(prompt_embeds=prompt_embeds, num_inference_steps=4).images(0)

For more information on block types, configuration patterns, lazy loading, and memory management with the ComponentsManager, see the Modular Diffusers documentation.

custom blocks

Modular diffusers really come into their own when you create your own blocks. A custom block is a Python class that defines its components, inputs, outputs, and computational logic, and once defined, can be plugged into any workflow.

Creating a custom block

Below is an example block that uses Depth Anything V2 to extract a depth map from an image.

class depth processor block(modular pipeline block):
@property
surely Expected components(self):
return (Component specs (“Depth Processor”,DepthPreprocessor,pretrained_model_name_or_path=“Depth-Anything/Depth-Anything-V2-Large-hf”))

@property
surely input(self):
return ( input parameters (“image”required=truthdescription =“Image to extract depth map”),)

@property
surely intermediate output(self):
return ( Output parameters (“Control Image”type_hint=torch.Tensor, description=“Depth map of input image”),)

@torch.no_grad()
surely __phone__(self, component, state): block_state = self.get_block_state(state) Depth_map = component. Depth_processor(block_state.image) block_state.control_image = Depth_map.to(block_state.device) self.set_block_state(state, block_state)
return The component, state Expected_components, defines the model that the block expects, in this case the depth estimation model. The pretrained_model_name_or_path parameter sets the default Hub repository to load from, so load_components automatically retrieves the depth model unless you override it in modular_model_index.json. input and intermediate_outputs define what to input and what to output. __call__ is where the calculation logic resides.

Incorporate blocks into your workflow

Let’s use this block in Qwen’s ControlNet workflow. Extract the ControlNet workflow and insert a depth block at the beginning.

Pipe = ModularPipeline.from_pretrained(“Kwen/Kwen Image”)

print(pipe.blocks.available_workflows) block = Pipe.blocks.get_workflow(“controlnet_text2image”)

print(block)blocks.sub_blocks.insert(“depth”DepthProcessorBlock(), 0) blocks.sub_blocks(‘depth’).doc

Blocks in a sequence automatically share data. The depth block’s control_image output flows to downstream blocks that require it, and its image input becomes a pipeline input since the previous block does not provide an image.

from diffuser import Component manager, auto model
from Diffuser.Utility import load_image manager = ComponentsManager() pipeline = block.init_pipeline(“Kwen/Kwen Image”components_manager=manager) Pipeline.load_components(torch_dtype=torch.bfloat16) controlnet = AutoModel.from_pretrained(“InstantX/Qwen-Image-ControlNet-Union”torch_dtype=torch.bfloat16) Pipeline.update_components(controlnet=controlnet) image =load_image(“https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg”) output = pipeline(prompt =“Astronaut Hatching from an Egg, Details, Fantasy, Pixar, Disney”image = image, ).images(0)

Share custom blocks on the hub

When you publish your custom block to the hub, anyone can load it with trust_remote_code=True. We’ve created a template to get you started. For a complete walkthrough, see our guide to building custom blocks.

Pipeline.save_pretrained(local_dir, repo_id=“Your username/your block name”push_to_hub=truth)

The DepthProcessorBlock in this post is exposed in diffusers/Depth-processor-custom-block. It can be directly loaded and used.

from diffuser import ModularPipelineBlocks Depth_block = ModularPipelineBlocks.from_pretrained(
“Diffuser/depth processor custom block”trust_remote_code=truth
)

We’ve published a collection of ready-to-use custom blocks here.

modular repository

ModularPipeline.from_pretrained works out of the box using existing Diffuser repositories, but Modular Diffuser also introduces a new type of repository: Modular repositories.

A modular repository can reference components from the original model repository. For example, diffusers/flux2-bnb-4bit-modular contains a quantized transformer and loads the remaining components from the original repository.

{
“transformer”: (
“Diffuser”,
“Flux2Transformer2DModel”,
{
“Pretrained model name or path”: “Diffuser/flux2-bnb-4bit-modular”,
“Subfolder”: “transformer”,
“Type hint”: (“Diffuser”, “Flux2Transformer2DModel”)
}
),
“Vae”: (
“Diffuser”,
“Auto encoder KLFlux2”,
{
“Pretrained model name or path”: “black-forest-labs/FLUX.2-dev”,
“Subfolder”: “Vae”,
“Type hint”: (“Diffuser”, “Auto encoder KLFlux2”)
}
),
…
}

Modular repositories also allow you to host custom pipeline blocks as Python code, as well as visual UI configurations for tools like Mellon, all in one place.

community pipeline

The community has already started building complete pipelines with modular diffusers and publishing them on hubs with model weights and ready-to-run code.

Krea Realtime Video — A 14B parameter real-time video generation model derived from Wan 2.1 that achieves 11fps on a single B200 GPU. It supports text-to-video, video-to-video, and streaming video-to-video conversions, all built as modular blocks. Users can change prompts mid-generation, change video styles on the fly, and view the first frame in less than a second.

import torch
from diffuser import ModularPipeline pipe = ModularPipeline.from_pretrained(“krea/krea-real-time video”trust_remote_code=truth) Pipe.load_components( trust_remote_code=truthdevice map =“Cuda”torch_dtype={“default”: torch.bfloat16, “Vae”: torch.float16} ) Waypoint-1 — Overworld’s 2.3B parameter real-time diffuse world model. Autoregressively generate interactive worlds from control inputs and text prompts. The generated environments can be explored and manipulated in real time on consumer hardware.

Teams can build new architectures, package them as blocks, and use ModularPipeline.from_pretrained to expose the entire pipeline on the hub for anyone to use.

To learn more, check out our complete collection of community pipelines.

Integration with melon

💡 Mellon is in early stages of development and not yet ready for production use. Consider this a sneak peek of how the integration will work.

Mellon is a visual workflow interface integrated with modular diffusers. If you’re familiar with node-based tools like ComfyUI, you’ll feel right at home. However, there are some important differences.

Dynamic nodes — Instead of dozens of model-specific nodes, you have a smaller set of nodes that automatically adapt your interface based on the model you select. Once trained, it can be used with any model. Single-node workflow — Thanks to Modular Diffusers’ composable block system, you can collapse your entire pipeline into a single node. Easily run multiple workflows on the same canvas. Out-of-the-box hub integration — Custom blocks published to Hugging Face Hub work instantly in Melon. Utility functions are provided to automatically generate node interfaces from block definitions. No UI code required.

This integration is possible because all blocks expose the same properties (inputs, intermediate_outputs, expected_components). This consistent API means that Mellon can automatically generate a node’s UI from any block definition and configure blocks into higher-level nodes.

For example, diffusers/FLUX.2-klein-4B-modular contains pipeline definitions, component references, and mellon_pipeline_config.json all in one repository. Load it into Python using ModularPipeline.from_pretrained(“diffusers/FLUX.2-klein-4B-modular”) or into Melon to create single-node or multi-node workflows.

Here is a simple example. Add the Gemini Prompt Expander node (hosted as a modular repository at diffusers/gemini-prompt-expander-mellon) to your existing text-to-image workflow.

Drag the Dynamic Block node and enter the repo_id (i.e. diffusers/gemini-prompt-expander-mellon). Click LOAD CUSTOM BLOCK. The node automatically enlarges the text box for prompt input and the output socket named “prompt”. These are all configured from repositories. Enter a short prompt, connect the output to the Encode Prompt node, and run it.

Gemini expands the short prompt into a detailed explanation before generating the image. No code or configuration required. All you need is the Hub repository ID.

Your browser does not support the video tag.

This is just one example. Check out our Mellon x Modular Diffusers guide for a detailed walkthrough.

conclusion

Modular diffusers provide the configurability and flexibility the community is looking for without sacrificing the features that make diffusers powerful. It’s still too early. We would like to take your opinions into consideration as we shape future developments. Try it out and let me know what works, what doesn’t, and what I’m missing.

resource

Thanks to Chun Te Lee for the thumbnail. Also, thanks to Poli, Pedro, Lysandre, Linoy, Aritra, and Steven for their thoughtful reviews.

versatileai

See Full Bio

What's Hot

Rowspace raises $50M to take private equity AI out of the back office

Introducing Modular Diffusers – Configurable Building Blocks for Diffusion Pipelines

Advances in Gemini’s security protections — Google DeepMind

Rowspace raises $50M to take private equity AI out of the back office

Advances in Gemini’s security protections — Google DeepMind

Extend intelligent automation without disrupting live workflows

Competitive programming with AlphaCode-Google Deepmind

Improving the accuracy of multimodal search and visual document retrieval using the Llama Nemotron RAG model

One Two AI Punch: Deepseek’s Image Generator follows the release of its Earth Shake model

Most Popular

Competitive programming with AlphaCode-Google Deepmind

Improving the accuracy of multimodal search and visual document retrieval using the Llama Nemotron RAG model

One Two AI Punch: Deepseek’s Image Generator follows the release of its Earth Shake model

Don't Miss

Rowspace raises $50M to take private equity AI out of the back office

Introducing Modular Diffusers – Configurable Building Blocks for Diffusion Pipelines

Advances in Gemini’s security protections — Google DeepMind

Subscribe to Updates

What's Hot

Introducing Modular Diffusers – Configurable Building Blocks for Diffusion Pipelines

quick start

custom blocks

Creating a custom block

Incorporate blocks into your workflow

Share custom blocks on the hub

modular repository

community pipeline

Integration with melon

conclusion

resource

Related Posts